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mrROOUCTTQN 

Wc are uitcrcfted in devices wtpch imdemand And act 
fkpQD spniken mpm irmo people. IVaditioAkUy, io ewjI) spoech 
UBtkxsinDffi^g systems, Ose hierarchy of Imgnifilic sjinboU 
attd iliucUues hA& buen manaally | eocstnicted, ixivolvi^g 
nnic* tobar and badiitg lo fragile Systems which are ncit 
Kibqsi in rsal flnvirooioAtxts. Di hmndn langiuge ac^tsiticin, 
how«ven (he phonemes, vocabulaiy, gramnw, ^ semantics 
(o eiDKgp naiuuraUy dttring th^ coarse of interacting 
with the worid. This cootra&t inotival|fis iia ta investi^U: de- 
vices which aiitomati&cJl^ acquixe thJ ^angii^g^ |ci;f ^j^tk task; 
diziiDS tiw CCQise of kneracdng with ft ccmplex aiviranmBnt. 
WhiJe a kmg-iictn) uivcsdgBtion, research In such language 
aaimaitioR dAvice& yields indghte hxXo how to cmatnjci 
speech Dwlerstaiidiiif systenB which iftre lraicable» adaptive, 
and robufit. ThB paiposd of this pjipcr is to recouoi oar 
progress and ideas to date in ihii emdeavtor. hi panicular. we 
dscribe the principleis and mechamsms undedying ttiifi re^ 
swch end review several experimfinral systeim. which bave 
been coQsaucted. 

KJ»tx principle lA oiir reseajTCh is that Ihe purpose of 
laogugge is IO convey meaning, so that Unguagc acquisition 
crucially involves liMttnii^ to decode ^at meaning, A fectfmJ 
principle is tha language U acquired tturinfi interactioo Dritb 
a complex ein^rcoutjent wherein the | device receives same 
input stimiUi, responda to that input, Ihsn receives fttdback 
as to the appiTjpiiateftesa of its rMponsc These principlw 
untolia our investisadofi into coan^ionhr mechamsms, in 
which a network constructs associkiions between inpui 
stimuli and appropriate machine respcjnsAs. We embed these 
ne<woAs in a conrrol-iheoTeUc mechanism fo^ govenunfi 
langu^fr acqoisiiion via ninforc&neA Uammg, If the rcdn- 
fortement feedback is positive, then the associations are 
itrttDgthBBfid, whiifi iKigative leiafbrcebient causes the aS80- 
dadoofi to be Weakened. I 

A system bloct djagram based on tbese principles Is 
ihown in Fi^. I. TTtf device receives s6me input, comprifiing 
hnguistic aiad possibly otfier siiimili. In nMpoose to this in- 
put ii performs some fiction, to orhJch 
provides a femanridevel error signal 
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nesfi of that repose. The device then 4uii^ its behavior 
based on diis eiror feedb^k. Tbus we assxmt that the sys^ 
tem will maka oc^onal errors, especially wbeo vaaannte^ 
ing unfamiliar slimulL The ein^^ 
iiy lo deiecr an cnor via reinforocamit fbedbftcdt fsxm the 
eavimment, to moMer from the aiar via fiftedback ecaMroi, 
then £jiaCy to 2ew7? from the etxor £D that U is cot jcpeaied. 

Tt^ goal of man— fnacihine cozmintmcxtiDcu io such a 
system^ Js to induoe ibe machine to undezga some ttqnsfor- 
inaliDa..11iia transfbnmmiaD ean be immcdiaxeLy obscrvabfe 
in Hie fann of some mfichi™! actioA, or can be an intBrnal 
state cban^e which is only obscrvahle i^jdiKdiy on 
some futnxe inteKictiQii. We denote the inpaz in tbe device aa 
ianguagit and the napping ftota input lo masfbnnaiion as 
mdmtandmg, % are then satisfied that the fnarhin** nndec- 
Mantis if ii respomte appropcialedy overa wiife range of input 
scenarios^ which is eiscntiaUy a rafonmilaiian of die 'nnine 

It i* wortbwhlle Cflotasting tins paiadigm with iiadi- 
tional commmiication Iheory, best acecni^mbed by a quota- 
don from Shannon's origliial paper (empha$is added) (Slm- 
noii. 1948): 

reprodiMrins at one point either txadfy or apprnxi- 
maiefy a message selected 4a another pomL Fr^MJitfy 
ihese m^tsa^ffs haye roeau^; tharis^hey refer to or 
an corrslaud accordui^fy w mmt system wiOi certain 
physical vr cancepmal entities. These semantic aspects 
qf COmmunicmion ^ye hrMlevnot to the engineering 
problem," 

In ooDtraat, for syatems which pwport to qndeisiazid 
spolcfin langiuage, the senumric aspects of conmitinicalion ace 
primary. How Oien can wc qaanti^ Buch notions? In petqile, 
inprt stimulus «vokas memories of associated pcrccptitros 
and acdviiies. We thus propose a rttref principle, that ineaa^ 
mg is grtumded is a de-wcc's interaction with its envlion- 
meoL This primiiple underiiefi ao investigation of methods to 
quflotify the meaning of spoken laagu^ via its Mtwork 
associarions to a device's input/output pcaphciy, providing 
an acquired tftpresentarfon of the device*^ operational envi- 
ronment IntnKtocing a tneiric and norm on these assocla- 
tinos fiorm the basis of a salience theory, which quantifies the 
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FTG. 1. R dufl mj aigid leunins in ungnaoe acqoisuoo. 

tttfMmAiiOii CQiOM of m ioput fitusalud for a paiticular de- 

In the remainder of this Ictroduction. mocivate aad 
oudine the alscrithmic TRethads wfaich h^ve jpioved \^fuJ id 
lAw invcvtigHtiVD vr VPt^i huigyaj^e &eqUi&iCiCul devices^ Tlte 
body of tbe paper vtU describe in detail chjcse tncchamsms 
and eoqienmeaual evaludtiona tbawof. j 

An vrfonmaicn-ihe&mic mtwort. For t}ie syitcni Ulus- 
tmied in Fig. 1, dte. arises of bow tmplcmcDt the 
mappicg £ro(n inpui stirauU to action. Wb have proposed 
coosuucDiks zonnmUmistnffW^s^ whicb build a&sociatians 
between inpat stimuK and appiopiiatd macbioe tesponses 
(GoTin £f oi, 1991). If a machine received pilisitive teinforce- 
mcnt, then the ccmnectioiis are sbrengthenedf while ut^&Ilve 
i»infO[rc«ntta( cauaes thd coRoeotiQns to be weakened. There 
art many nktibods Otait luiv* Cw4i prnpnwv^' in fbp. lifftretnrr 
for Icamidg sudi cotmoctioTi wdg^. In pniticulan we dei£jQ& 
the connection weights of these networks yia mutual injbr- 
matUm, whic^ has a variety of tbeaiedc&l and pracncal ad- 
vantages over gradient-based training methods. The defUU- 

dOR and poroperHes of such tnfbmuuion'Theomtc ncrworkj 
will be described in Sec. L j 

Onr earliest experiments involved '• a iingte-l&ycf 
infomAtion-liheDieiic network which conati>ic(s diitct os&o- 
Cleans between wordfi ftnd coesmngfiU machine fic^nnx. 
THis simple aitUitecroie correipQiids to a "bag-of- words'* 
l&n^^i ibMa It WftS ThM exu&d^d ta a| nrald-l&yer n^- 
work which in additinn builds a^ociaiions between phrases 
and acnonfii acquinng a nidimfitttary ftyntactic svucnire to 
improve iinden ton ding. j 

Those eariy experiments [nvolved a toxC-baS4td AUIO- 
nated Call Routing; systeni ^Gnrin f r a^| 1991), dien a 
spplcRI>-inpnr vptsion of fhJ»r sy<sf«rn ^flnrin ^tai., 

lP^>4a). thai 9CCDAdo iovobfcd a Dcpanmcfu Sioic, whkh 

receives inpnt sndl AS- I n&ed itam£ paint for my ret^aod 
talkie, whence die appraptinie response is (o route the caJler 
CO the Hardwaze Berpartmant. Mor6 recently, ihz&t itieihod& 

WW applied t* a dai&basc of aciual nurwiiei/optraTOT 
iogi from the AT&T telephone network (Gotin «r al, 1903b) 
(Gorin £} ill, 1994hJ (Sankar at, 1993). In this stsenario, 
an input mi^t be / y^am lo reverse ihe charse^» whence dje 
appropriace response is to route ^h& caller ^> an automated 
subsystem which handles coUccl calU. These experimental 
sy^Ktn^ will be described in Sec. m I 

Smdtmftd rutwoHfS. A3 a device and I its task become 



more ooTDplex, 90 does iho mapping &ina bput sticmii to 
machine action. Given some network arcbtermie; a icasai> 
abil^ question is co ftslc witether ic is capable of leaming such 
complex nuppi^ga? A striking fiaature of hotnan langnsge 
acquisition is c(ar ability to malQ6 sweeping {CtteralimtioBS 
torn small nunibcxs of cbscrvatioos. For example^ a sxag]A 
observaiion of a new vM^rd, in the ^kptopriate context* can 
suffice to acqtiirt ii2 prcDimclatioiir syix^ctfc rale, and se- 
mantic a&aociations. 

We observe that the imnra) network m a hiologica] or- 
ganism Is HOT homo^necua, but T^ito highly siraainrd and 
mochdar. ^uch structum develops over evohitionaiy timd, 
matching itself to aspecic:s' sensoiry^motor periphery and ea- 
viromneiDt One can hypothesize dial the oonstraiiics provided 
by such network snucture conespond to the innate cbarac- 
leristics which enable an individual axganlsrn 10 xapldly 
Adapt 16 lis eaviromufinc ^imgb, 19&£D. ThU lufidvaces a 
fawth pTutcipk, that in order to prafvlde la^d fsamizig and 
generalization a device must reflect the flmcnire of its inputt 
output perrphecy and envirOnznenL As oibfierved in (Minsky 

and Papcsri. 1990), 

^'Jhe maryetous powers of the m^r^f notf^m 
any singlcy unifonrUy jTrttCTUrvd C^nnectiOniSi ncfwcHk 
hftr from highly evafved arreaigemsnts smaller, spe- 
ciatizied networks \t/h}ch are uuarcorui£cted. in very 
specific ways.y 
Thus QUKivaced. we have investigsxed siractured net- 
4/ork a/ichIte£Enres whose cotunalncs gcucly acceleme Ae 
learning process. In partknilar. we developed sevoal msth- 
od<c for consirucdng. large structured nctwoik& by coirdjiniizg 
CD(iH>oatf Quhxutwcrkci then axpvrinrantally avaluakvd 
those networks in severa] appIicatiDD SCenmC&« 

In a Call Routing Lssk» the set Of mngMw* actions is 
merely a list conespondine.to a paiticulartv &in30lc output 
periphery structure. One can consider che moro (enoral &iiu- 
arion whcro tho tnnchino octkms ootnpride an r parameter oet 
of snbnxuine caBs. MUler has propo^sd ttr cunstiucdon uC 
product networks for such devices, where individikai subnetr 
utotVa gift aJlocaied tcy ^ach ouipni parameter, thereby reflect- 
ing the output periphery auuctuic in ics nftCWOtk ikichitoccure 
(^/lilltir imd Ourin, l9$Sc)- This prutluci iKLu/uikeuablfcn. im- 
proved eeneraHzaiion by f!actonn£ phrase/action associations 

ChlOU^fa iOtWWWtfinTP- x^anrK ^^nVrt'l^w?? TfK«e iffcaa win he 
expanded nnd doiailod in Soc. IV. 

A cwo-dlmensional pioduci neiwock \m bedn vxptsri- 
mea tally evaluated on an Atnumac data retrievU task, first 
teact-hafted (Miller and Gorin, I9^3c). then speecOl-bascd 
(Gorin CI oi. 1993e) (Miller end Gorin, 1993b), Tho fiysUm 
respotidft to inputs such as Whm is the largest nwvmain m 
th& tmpire Sutie?, Ic which the appropriate machine 
BpODse is J>itf hiqheii point irt hi^ York Sme is Mt Afarry 
f,T,?tf4/w| This experimental system will be described m 
$cc. IV. 

In many situations of imerest, the flpptroiiMiate machine 
response to a spoken input depends not only on the n>esaagCp 
but aJ$o on the state Of ics environment This motivates US to 
in\-3sdgate devices with both linguistic and other input ch^ 
Such extra-Unguisdc informaiion can serve to resolve 
ambi^ities during understandins as well as provide redun- 
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daiicy ti> sccelfirtrtc Ungoagt acqd? ition. For such <ieviocs. 
Sankflr has proposed the conficructiQn of senaojy priming 
subnetworks, wbich Ifiam the Cfossnchannd associAtions he- 
cwecn difterczit input itimuli (Sankar and Oortn, 1^). 
ThesE. sabaetworics are conthiiied m a product an^tecture^ 
r&fleaiDff a d&vict's input periphfiiy| stmctuie, wbosft Dutpu( 
is tben used m comroi tht maduoe actions. This aicbitecture 
provides improved gBnejalizatioD by faccodog phiase/Kticiii 
a&sodaiMnis dtmugh the $ea£Oiy priimtivB Kuhuetwoiks. 

Saxicar evaluated chi& tictwoik and control dtrategy in a 
Blocks Worid sccDario, in wfadch ti» noacliine has t»oth lin- 
guistic znd visual input cbannels (sknkar and Coriii. 1993). 
It is picsented with a scene coinpnsing objects of varying 
color and shape. The xnaclunB iiciioQs ^wn^safotusiisfi its 
mteniion on a particular obfeci in i^spODte to input sQiih as 
Whert is the red square? Such ft»cu3 erf* attention is a Acces- 
sary procquifiiiB va torn compleat lactions. More fccently, 
Heiii3 h&s extended tW* system, ccmnecting it to a iDbottc 
9iDQ(u]aiDr where the machine actions cttsspdse matdpulatiiig 
the blades upon which it has ftocosed its mtenrion (Henis 
€t al, 1994). Ibcsc experiuBjital systems will be discussed 
in Sec. IV. ! 

Syn^lsjrom signals. In order to provide rapid leaming 
and geoeralization in a language acq^sition devfet. we have 
exploiBd mfilbods for rejecting thejsffncmre; of a device's 
inpottoutpart periphery in its network Ofchiceetiire. There is 
alsQ structure In tht enviianmest. in that the input signals to 
a dcTvicc can be OErB^ized into aymbolsj tt»n faiihct arga- 
tttzed iniD hierarchical stnjctures. TWiile sue* stnicciire can 
bo imposed on aU sensory inputs,, in lhi& nefteaich we focus 
OQ liagd&stic symbols ajid structures. We beUeve, however, 
that our methods ore gensnil and can be applied to all cog- 
nitive tnodalities. j 

^ focus imtially no worcU, which are the fuAdamental 
symbols of ineiaAiiig in language. In traditional speedi rec- 
OjoitiQn syscems, one specifies the v^bulary a pticri then 
tiains the iecogai2ei by presenting ii with labeled ^Kcch 
{Rnbiner and Juang, i9ft^), Duriag hiMnao language acquisi- 
tion, however, uicuds aeem to emtfi^o natunjily during the 
cwrsc of Interacting with the worid. How might this be? 
Ftonhcrmare, how might we mimic jsuch characteristics in 
our devices so as to impi^vc their teainablKtyi adaptibilicy, 
and lobnsoiess? | 

This contrast oiodvates us ro Irtvesdgaie meihods for 
automated flc<LUisition of flpoken words. Webster fttfcbstcr, 
1^7) defines a word as | 

"a sptech sositid «. that ct>mmi^icates tn^etnins 
vWrtffjif being divisible inro smaiir miu capable cf 
independent use, " 

Based on ttie iitttaition that a fiymbol should be a stable point 
of some operaioi, we have investigaie<l clustoing alsorithms 
that search for speech sounds which j« acoustically and se- 
rnantically consistent- A pxercqniMie to measuring such con- 
&istency is to deAne scou&tic and Grauantic feature spaces 
with ^ppiopiiat& mfitrics^ 

In people, an input stimulus evoiicis meiiKiiies of associ- 
ated p<srceptiaDS and aciiviiiej. This Tinotivatcd us to define 
riie meanmi of a word, for a paiticaW d«vke, (d be its 

3443 J. AoousL Sqc, Am,. VOL 97, NO. Junft 1995 



network cssadaiions to tbe device's inpnt/omimt pei^i!^. 
Such a definition grounds zoBaning xzi a deric&^s isteraotioo 
.with its world, being dependant on its iaput/ontpmpctti^iCTy, 
envijoiiment and experiences, hi Sec. V we will describQ 
Ulustrativc examples of sncb setnamicfsa^ry ossocLstums 
for several expenmental dcvicesL 

We have defined & distsoce betweon those osocimion 
vectors, measuring the semaatic simOarity of two wtsrds {bra 
<}6vioe. ^ fimhennore define a tKnm% which measn^ the 
semantic significance of aa individual woid or phrasfr. This 
nonn ineasures the infomwsian consent qf a w^^td the 
device, which we deiwtc salience. This can be tUSEingaidtea 
from and compared to die tmdicioaal ShawTon measure of 
i^rmotum consent, which measures the nocenaxqiy chat a 
word will oocttr. T^ttsss thfioredcal and cn^Mrical idaiioti- 
will be discussed in Sec. V. 
Based on these ideas, we constmcced and evalnated a 
rudinaentary spoken Unfiuage undeistaiuluig system (Gnrin 
era/., 1994a). It b urtii|taB in that no text is provided to the 
devif* duriiifi either testuvg or tirritiins^ in oculist to a£ 
other speech undersmding systems. It is also nnique ia tkat 
the vocabulaty and grwnnar arc unconrmaned, being ac- 
<toired by the device during the mnise of performing its t«tc 
This is also in cocmnst to all other systems, where the salient 
vDcabulao' wtTrds and their meaiuDp are explidUy pA)Vided 
to die machine. The snitiA] applicadoA vehicle for this experv 
meat In spoken language acgulsitiOQ was tl^ Department 
Store cask (Corin cs al, 1994a), then die Aiznansc (Gt^ 
t\ al. 1993a) (Miller & Gorfiu 1993b). These experimental 
systems will be dedcnbed in Sec. VL 

Gramnusticat htfsrmce. The above experiments focused 
on aci)oidiig word symbots from die speech si^aL The ncxs 
level up itk the linguistic hiermliy is ^r^nunar. coi&pnskkg 
symbols and structure which govern the «:(«ptable ccmbina- 
ttons of wowis into Sentences, daiiunar plays two inrpoctant 
roles in speech understanding. Ftrsu ii caiKtraiits die allows 
able wo(rd sequences, increaslne the sigBal-tt>-iiaise latio and 
thus improving oui ability to recognize words in noisy or 
highly variable enviPDwneaie. Saccmd, ii wwytni y^ die 
meaning ofa word acconiing to Its positibn in a sentence. 
Thus meaning can be vie«rcd as an afitrihiiic of a woid in a 
particular syniactie scstto, raAer than of the word alonc- 

The automated acquisition of grammar has received 
much atteinion, imeitwinBd with the classicaJ debate coc- 
Cfiming how much of b'ngoiaric structure inu$t be innate in 
order to account for human behavior. We observe, hawev^r, 
diat the aoquisidoa of grBmmar fioni nwffcJy listening to 
Speech is a much harder problem than people acmally solve. 
In hozhans. language ii aGquLnad during the course of mter- 
actii^ with ih£ U'culd. eiXplOiling boA speech and other »ea- 
aory iopuc A diaUeogc» then, is to understaaa Il0«^ To exploit 
5uch extra-linguistic infofJUation to guide gmmmadc^ infer- 
ence, goveracd by die goal of learning to ckcodc meaning. 

Afi motivation, let us consider dw basic pans-of-speech 
such as nonns and verbs. In elementary school, childxen are 
tangbt that a noun i* a person, placa, or tfung^ and thai o verb 
is a word ihj« expresses cn aetlcn. It is ^riking ifaat eke 
classroom deflnitioji of $uch fundamental syntactic concepts 
are pundy seTnamic. If one constructs a machine that can 

AWon Goon: Automated tanguege acquisjtian 3443 



PACE 20/38 « RCVD AT 7/7/2005 2:17:58 PM [Eastern Daylight Time] * SVR:USPTO-EFXRF.1/5 * DNIS:872fi306 • C SID: 1-4 10-5 10-14 33* DURATION (mm-ss):31.30 



To: USPTO- Page 2 1 of 38 

09/23/2003 11:22 



I 
I 



2005-07-07 18:18:49 (GMT) 



1-410-510-1433 From: Thomas M. Isaacson 
NO. 943 P05 



Jnterect witfa iliiD^ fdt example in a Blocks Wckl4 dien «U 
^irases with lugb £&li6aoe for &ticb ihiogs can be chuteicd 
iatd a part-of-^pccclu Sucli m} abstraction rwou2d carrcspond 
v> tbe esafy sonamie chancttdzation of a itoua. SimilArly, 
given a macfrinit wbicb can sense the aftdbutes of things 
(eg., oolar or shape), tten one coaki acquire a pait-of^peecb 
coassptadmg to chft early notion of ais! adjective. "N^rbs 
could dmilArly be emfir^gem from assaciatipns to bme deiiva' 
dvec flf Guoh attti>uic». | 

WbOe each driigiiinns m ^ subject of much debate in 
linguisixca, they serve aa ussfok imuitioJU to motivate our 
investigQiioa bito exploiting fteoiantic/sensory ossociaticds 
for grannnaticaJ infeiettce. In Stc VQ. we'will first describe 
the method of salieni^ thresholding in' an infonnatioa- 
tfaeoretic caaneetiomst netn^ork. Thi& dueiiholding yieJds a 
sjbnetwofk which CQEiespontls to a pan-ojf-speech for each 
dimensioii of die device pcripfarjy. THa Tnc[|)f;fY)| SUtyocTWOlIC 
is activatBd only biy ihoAB words or phra&esj which are Hj^y 
salient for in semainic or sensory primidv^. 

Uncc these pan!S-of-spe«ch are acquired they can be 
Tn?mip > ila l e d jnat like any other symbol, flkua propose a 

prific^^ that tansJiage acQui^tion pnjceeds in develop- 
inaum] ^bvgui;; from ihe eoncws to TlkC AOSCraci; bom the 
simple to complex.' In adult language, p^rts-of-spe«ch aie 
chara:t£Tized tvuh fUcHr meaniag and wiUun<4ansn«fie us- 
ogc pancms* Is Sec. VIZ, we also report oil piollmiuaiy er- 
pcrimmrx wliicA le-estimam induced parts-iof^speecb so that 
tfaey become coosisteiit: &om both diese pekspecdves. 

An application of these ideas was eocpkntd by Genner, 
T>i4iO cDndmcted a hi£ararchical Detwmk with Aobnetwaiks 
cQTfcsposdiag to parts-ol-speecb in an Aiiline XnfannatiQn 
task (GeitoeT and Gocin, 19^3). A query to thai system might 
be / wtfMi » imvw Nww York and fiy $a tAt Wiaefy City, to 
which the appropiiale cuaulune response would be lo dUpUy 
a flight tahifi htnn New York to Qucago.lTH: principle of 
devdopomital leantia^ c&Us thai in order far a device m 
arqaiTfr dke laogua^ involving poiis of ploccs* U must firfC 
acqinro the Ungoagd associacod with indivitual places. 

A stable suhnetwork foi places was cmbedided Li a hier- 

atcfascal networic, with Keirnndnry gnKi\ftreu <\ry<: COn^VpOBding 

to modififir phrases. Rnpid looming and gdncralitadoD was 
mJuuvcd by ^curing pnra&A/a^tion assdciations thruDgh 

these mocfifie;r SUbDctworks. For example, an encoonter with 
ihft phnw /tfovf JVffw York leads to the acquisltioD of tha 
meaning of (stm u it telate;^ lu jU] ykce dames. 

nummary. The piinciples and mech4ni&CKl& piesented 
here form ilie basis of a theoiy of Ayniax smd wraanti<:f. 
when eoavaying moaning ia pnmaiy and Upgalstic »crttCQiic 
servos to mzkt such commumcation robu^ Aithongh our 
exptiinieiiAal devices are ihos far rudimenfiary, we consider 
than to be tihe early stages of a long-tenn xqvean^acioD into 
niRrhfnes Which automatically acqinro lanfinage ihiou^ in- 
tcracdon with a complex cnvirooment. 

This paper proceeds as follows. Secdon I defines the 
basic infoimadon-iheopetic network, ics traimng procedure 
and ba^c properties. The feedback control mechanism used- 
for dialog control Is described in See. H. In Sec. lH, we 
describe the caqiertnwmai evaluadon of thatTjasic oecwork in 

3444 J. AeoifSt Soc. Am.. Vot. 97, No. 6, Juna 1ddS 




FlC 3. A maMlByir aecwofk ttosff^og haeoa^ to semasoc Mkns. 

Call Routing taskOw Stnictuicd netwoTks arc diAcassed in Sec 
IV, in particulflr their oppllcadan to the Aknanac and Blocls 
World tasks. In Sec V, WO define salienoe, discussing its 
relationship to information thaoiy and pfovidiog seveml il- 
lusffadve exampKwt af $^iTnantic/sea»cy association vectors. 
Our expenincnta in spcken language acquisitioTi are summa* 
rized in Sec« VI, where the device acquirts spok^ words 
tiom speech wilh no incervening lexL b $eCw VIL nzdiinen- 
taiy experiments in salience^iaBed granvmticol in&rsnce are 
disciBsed. The application of those idea^ to on Airline Infoi^ 
maiion task wiU then be de»culNsd» involviiig a snucovBd 
biiiitohical iDfonnaUan-theoretic network. 

I. AN INFORMATfON-THEOREnC CONNECnO)4(ST 
NETWORK 

hi iluti scaAo^u wc desotbe a nuecDaaiSm fcr teaming Ihe 
mapping from iqput stimnli to ma/^Kiffig qction. in paxticular. 
we describe a connecti^ur np.rwAf|r^ cxiflinally proposed in 
(Gorin cz at, 199 1)« winch builds associadoas between uipui 
stimuli and ^proprtate itiAdhiiK rerponses, U ihe machine 
receives posinve reinfoicenient to a response Thetv tbc con- 
nection* ant -^AmngrHmed. while acgadv^ leinfoiccment 
fiausfifi ihfi coaneoCLODS to he weakened. 

Tlic boiKsi: aeiwock archJiecniffi ]$ UllUtraiCed in L A 
spoken or typed ^ntence is applie<i to Oie input layer; which 
comprises a coUefcaon of word-deiMtor nodei;. Thasft ncwtw 
produce an output between zero and one, approxiniadng itte 
pn>bability that^ particular word is piesem in the input. In 
the simplesi ease fttf keybo&id input, the outpat equals one if 
the input word exactly inaiches the node, else zero. 

The intermediate Ifiynr Mmpri^pjf: a coUeCCioa of plvase- 
detector nodes» io the simplest case coxT^^poDdine to adja- 
cent word paizs (bignms). Tho output layer comprises nodes 
which COnespoad to the various actions diat tfie machine can 
perforrcL In this discussion, the action space is a list of sub- 

AHen Gertnc Automated language acqulsfUon 0^44 
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layers 



routine calls. Hke ejcteosioo of ihese 
plex devicts via stnictured nemodES- 

£to«nL. Tbejr are inhialiaMi bi »n>. 
sew woids and plvasea aie 
board tnpot, a word token cao be 
seqoenoe delimited by bUolcs or , 
cm be nKfcsr simply defined oii£ 
finm the **'g*^T>g vocabnbuy {Gorin 
void crittiioo can be soAiaied Qys 

bued on dtring-distaitioii mcasmcs 
(fistance (Uvrashtdm (Sankoff 
Se6. VI, we address the tdated _. 
(Gorin et at, 1994a). wliicb involve 
nkan^ OisbHtions. 

IWe have beezL a umafacr of . 
IxmaziiTc fnr crakiiDg ^cb networks ^ 
land, 19^ft). la this rBssaich, we have 
weights between words aad actions 
madod between itkose events (Cover 
leads to seveial attrajdive [un^exti«3 
scctum. If wc dERQte the cmrBW 

V={u,,D2,„,i;^} Aod die |»et 

C={c , ,c,,_,c,j, then tbe mf« 
«dg}it& axe £^vca by 



defined ] 
pttactuadon. 



inpitirvh to mofe com' 
Efi discussed in Sea IV. 

shown paniaity 
gtowiog orver time afi 
the cose of key- 
as any cbaraoer 
A new word 
whicb diffets in any way 
ifctl, 1991). TOs DOW 
And Gcrm^ 1993^) 
as ifae Levensbpein 
aidKmkBU19S3).]i> 
for S|Kiken input 
both acoQStic and se^ 



pcopoaed ID tbc 

(Rwneiban and McQel- 
e te fi n cd the coDwcfli oai 
ti ) bo ibe rnntnai infor- 
andlbomaa. 1991} This 
discnssod later in this 
vocdbiUayof N'wonis by 
of J!: actions by 
L-tbeomtic conc&ction 



(I) 



whAe is comEtionai prot ability that a sentence 

coniai n i n g wofd connotes acdon where P(c J is tbe 
prior probability of iliat gctlOQ, and f t, is tbe mutual 
bfoimatieo betwMn tfaa wozd aad ac^ an. "Xbia is innoitively 
saiLSfyiBg a& follows. If the presenoe of wQtdir in a&enience 
Bttafccs an action c rawe likely* then /^(c:|d)>/*(c). m that 
Ore connection weight is positive (eitiaiaiy)- Similarly, if 
the word u makes an ai^iiaa c letis bkelly, then die COttA£^on 
wetiThl is negative (inhibitnry), FiaaUy, if the woM has no 
offecx, tbeo tbe condidonal and prior pkobabilities ate eqnaJ, 
BO tbat the connect! 001 wei^t is zero (LmU). 

Iht connection wej^t between a, phrase and action is 
defined via excess mutua] infonnation. While in piincipje 
scalable to any n-eiam pbrase or set ttereof, we restrict tbis 
dkcnasion to adjacent word pairs {t>j,i>j). 



Biases for eacb output node (cf. Fig. 25 aie given by 



Tha activation at each output node is Joqiputed via a linear 
con^inqiUpn of those inputs, 



whert d^i^ is the output firoro ttje 
DiUj and &^ is the output of the 
u,. Thac ftgdon c^^ which hXiSi 
parfonhed, where 



maxiiniim 
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(2) 



(3> 



{4) 



phiase-dletector 



nod^ for 
node for 
activation is then 



wQffd-detector 



ai^ wax a^* 



(5) 



TheoieUcal prvperxies. The iofonuaiiaihlfaeoirBtic net- 
wcO^ bas servemJ idatioiishipa to oiba: mtdiods, wfakb we 
bfieHy leniew toe. Givai suitable Markm^ assizmp6ite& 
on the langu^. then to above algodtbm is eqnival&nt r> a 
maxhruBn a paitaiion (MAP) decioan (Gonn €$aU 1991X 
Given soJiable independeooe assumptions, tbeo s. faag-of- 
w^fds znodel applies and tbe ooonecdoiu ton ibe inisxme- 
dlai6 layer yaniah. yifibfiog a amgle layer oetwoifc. Under 
conditioiia, tbe iKtworlc is agaiiL equWalent 1& a MAf 
deoitfiiKi (Garin €taL^ 1991), Hsbby cihserws ±at die 
infcdmadooptbfiCtfetic netwcik is equivalenr to die flnTetyg) 
of cl aj wifi c wrt on via ffrinimrrfn deaciipiian Imffb whac ibat 
intefpnnBtion is selected which pzoviiles fnr tbe ntrmni TO 
code length of the input semteooe (Tishby and Gcrin, 1994) 

Addressing tbe p^l^ of rule-inference for expot sys- 
teans, Goodnoan sbow^ thai Umi strength of a nainfi^ te mle 
can^ be cbaracterized ty (be mitual ittfocnuaioQ tctwoea tls 
if and thm cbmses (Goodmao el oL, 1992). He firtemoze 
destinbfts a jnetbod for conckhining tbe parallel firings of such 
rulte via a canocciionlst network with infannatran-thecffedc 
weights. litis tesoic is rather satisfying, anteliofatzi^ ibe tnt- 
didonal debate tetween connectitNud and rule-based ap- 
proaches to ixiacfaixte inteUigeoce^ 

H&hby pcoves a bniv^sality theorera for inf eimatioti- 
theoictic a&sodations. showing thai, under solt^le hypott^ 
eses, any association nusasme which is fnnctvmally T«lat3d to 
probabilities can be rcscaled to mDUaZ infoimatun tllshby 
and Gorin, 1994). There is a seemiT^gly rdaied of results 
proving tbat when appropriately trained via a meaiHs«2uaied 
error (MSE) cntacioa, fko netwoi^ outputs provide CStima^ 
of n posteriori probabilities (Richard and lipjanann, 1991). 
We remark tfut. while all tbese relatioiiships are qiiite fasci- 
nating, fully understanding and exploddDg ibem lemains an 
issue far fiicuiv le^artb. 

^timatum. After having deci<fed on tbe infonnatica- 
dieorctic necwodc, iDe issoe remains of how lo estknate mu- 
tual infonnation. Tbe probabilities in fottmula* (1) ttnou^ 
(3) can be estimaced via snftoothcd mlatlve frequencies 
(Gorin et aL, 1991). In particular, after encounlcring input 

sentences jj^ij v -VX<^±,ti«) decotE ii» ottmbar of 

sentences of class c^ cootaining word and let AT^Ch) 
denoee the number of sentences in that class. We cerapnte 
CCtortn ST al, 1991) snooothed reiadve jreciucncy csrimaies of 
P{c^ and /'(cju,,) via 

Mc,) = a-«-.)^ + a^^/ (Q 

PAo^u,)^{\-MP{c,)^^pJ^^^^ . (7) 

The interpoletien pemmctOT a/ and arc set to /'/(m+^ 
for soane fixed prior mass m. These estiniatois have a natu- 
rally incremental implementation, either via updartng 
couriers or vU maktsdning sufficient ataiistics Ouda and 
Hart, 1973). Furthemore^ so |on« as (he meaning of words is 
fixed over tima^ than the relative frt^encies converge to the 
probabilities in those fbnnulas. In contrast, network tcabdn^ 

Alien Gorin: Aiitomsted bnauage BcquisJUon d44B 
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methods based oix toial Toean-aquared en or (MSE) iovclvc 
bati^t-tnode idgorithms such as siDgular va tue clecoTr^x)8]C^aD 
(SVD) or iDultipass algaiitei$ web as gitttient semh 
O^mnclbait aod McQdland, 1988). Obi^ also that for 
M5H methnda, ooe mast define a smoolih < iistartioji measure 
on the oiax^ spaise, whicb is noc clecescAry for Cbt 
inforniaxioiMlicOfedc Detwadc, One can pi however, that 
for an ai^Ropviatc eiior functjon^ tbs iof mnatioii-th&affEiic 
opdoxe vectcT bas the same «gn compoACDts as the gradieat 
of the siikS^te-diBp emx (Oorifl and Leviufion, 1989). ix., ihey 
are mftving in the same geueral direction One can fUnhfif 
ptrove that die iofbnosiiaiHheorecic opdat: n suaranleed to 
deettta^e ihai dziglft-slep enor function (Gc lin and tcvinson. 
1969). A lUflUA] next step wodld be to ex end this xcsuh to 
the global fhnctioo. bat it Is Dot clear hi^w to do so. 

SmaS »okple arti&cta m oa ubiquitous ia&iie in statia- 
dcal langt)^ imidci£. is a weliknown empirica] 

Qbservatlfts wbidh fcells ua that in seneipl, (here wOl bo 
many low-fteqiieiicy events and only a few high ftvqueacy 
events {l>Sfcnat, IJWi) (Hpf. IP49). T^ere afe me&Odfi which 
attempt to ajmelianiie tfaifi problem, such as Cood-'Hirix^ 
esiknaiois (Good, 39^3), tised by Rose U cstimBtB mutual 
ififomiatioD io hSs tegkic spottinR experimsoi Q^ose €(al 
1991). 

Due lo issues of siuaJl sample statistics and context de- 
peudeucj^, ure Jed ID inve^i^^x^ fi>ct4sed kamjng, whtre 
ore woqJd like to adjust die learning rate for a word based on 
its conlexL We illustrate this issue with an Example ^omthe 
DnpanmeDt Store ay^nn. Con&idcj an input sentence / irarj 
tobxiyanstagm, where the wopd eia^ent ijs cDcoumered for 
cfaft ftfst lime- Given that the appropriate octioo is to connect 
0ie called to di£ riiinif im* /l^aitinpiit, oii[& chould ^reAtJy 
sttengdieii the associaiian betwca emgm and that acckM. In 
contrast, cotiiidra a uoruod input sencBnce / Iniy a 

inasiva sweater, whece the word, mauve is eijcounteccd for die 
first time and die appn^iiane action ia to ctinnect the caUcr to 
the Alnttim£ dfrpaxtoaeiit In thi& flXAmpi;^ however, one 
should Icam mOy a inild assoiJatiua between mauv^ and thai 
caU-action. To snmnaariae, depeiKlb^ on ccntwct, ono would 
like to accejexaie the leaminB tate for some wcrds. decrtaftrs 
i» ctha& 

Wc consider how to quantify auil eipbii ibis IniuldAiu 
followiiie fn&hby and Gonn, 1994). ObscfC diat etagere is 
the Only pouibla expJanadan for tha inir-rprr-^ntiCD «f the first 
$f ntencfr, WhiJe mauve is ntvt nacBssary to correctly intcip<ct 
the second wntciice. Tliai ly, ihd (mtot in luiderttaDding u 

large in one case, sn»all ia the oiher. Thus one would like to 
modulaic the leatwing me based oo tha esnw, » wftH- 
^»n4ef«*ood prLo^ipla in MSE optimirniion algorithmi. 
Tishby observes thai there are both algebraic and Statistical 
Htrxictuiw on the network's parameien (llshby and Goiin. 
1994). The alsehraic pcopertie^ can be addressed via MSE 
mcdiods and the ^catisiicsl pw^^rncs via relative freqwen- 
des. Ha combines these, proposing an algebraic method for 
estimating statistical associations, often abtaioing good eiti- 
mates for wordj whkb occur only once. 

Farrell investigates a gradieiu soludon to the algebraic 
forrrmlarion. achieving similar pcrfonnance to die 
infbnnarion-dieorcdc network on the Department Store task 
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(Farrol] et 1993). "nint work, addfcsaed otnly the algebrfih: 
foTznidatioa. not coaaderiiig the statisttcal ^tr^tiire. Gftumer 
(CcKUDCret^, 1993) desoibo a tybnd s^rproecb lead- 
ing to improved results^ usiug mutual iiifaaziadott as an ini'- 
tial estimator followed by MSE optiimzatian. We: canjecture 
that this xesahcaa be Mplainod viaibe dual stroctme on Ihc 
paiameter space. Wt close this discusaton by ohservii^ diai. 
while very prondsios, this line of ifaonght tm ekphutkif al- 
gebraic structuie to imintsve stadstical aiSdciatioDS itmains 
an open reseaich Issae both fbeoneticaUy and empiiicBUy. In 
See. Vn we present an ahemate approach to focused lean- 
ing, reponing on pzeiimiBaiy ezperimfinb in eiqyIoidAt 
mated syntacdc state to adjust leanuag raEe. 

It can be ahown that when words or pfarasoft bavo uni- 
fonnly weak associations, then Ihe esdmates cf tteir eomiec- 
don weights have increased vttiiAncc» thereby injocting addi- 
tiooat noise into the underHtanding pmcress. This lead£ us to 
conaider chppmg to wno die w^hts of ttioss tvords with 
weak associations. In (Gorin eicl, 199|), ibis wna imple- 
TnKmwf ^y clipping die ecdinates of to Fic} if ihcy 

Twre su^iently dose. Wb have recently mUodnced iin- 
prtrnd inethod to address this problem, dipping the connec- 
tioiks of low-.sali cnc g word& to Tcro. Thb satifnce thresho!d- 
Utg both ledacfiS Ihe effective VOCabulny and incroases rKa 
DDdcrstanding me, as (fiscussed later Id Sec V. 5iibv&. 
cabaliiiy sBlfcviion is of great retevtoCc lo Asggmn^ and 
evaluating the speech reco^niiion firont-end of odj sysEBms. 

Summary, In this secdoo wc have described ^ 
infonnadoB-theotedo oooncctianist nctwoil^ »hich is lis 
basic butlOing block of cm- language acqnisidoa systeras. 
Sevenil obeeivadans ara in order. Hist, in all of our ocpeti- 
metits. the vocabulary and par^eter spsirR grcrw ovar 
as new words are encountcfed. Second, ihe networks are em- 
bedded in a dialog control systeiu. adapiini^ UttarpoEameuK 
based on reinf6reement feedbaclc trOm the environinenL 
Tt^d. as described in See& IV and VQ, this baac aetw^ is 
embedded in larger 5itrucninvl nnmrorK? lo enabla laziguage 
acquisition for more complex devices, 

II. DIALOG GOffTROL 

^ guvem (he behavior of & device based on feedback 
as to the ap'propruLieaess of its actions. ^Qcb iMforcemcni 
ffietfback cauAe.^ an imrruvlisttA modifiCMioa of thd device's 
behavior (conrroi) and ihcn a modificaticin of Uic tltivice's 
future behavior ii^dmlAg). 

In our communication paradigEm. input is provided by a 
peison. whose goal is to »nduc« rtiR m^i'^hiise to pexfonn comt 
acdoo, Tha intKraction between Inwmo and machine U calkd 
dudoji, wliiidi srrves the imponant »ld Of retolviiig amhi^ 
iiies and misundentandmgs. This iniMciion between the 
tnacfune and its eovironment is implamenied as a feedback 
concrol «ys<em» as was illEistrated in Fig. 1. In lUs seccii^, 
>Y6 deficribe the bode diaJcg control meduuusnj oaul mv\u 
syiteqvs. 

The iniriaJ iapqt (o the system i* a natural langu^ re- 
quest for Che machine to perfomi aome acdon. Based on the 
machine ctsponse. the user reipoods in turn with a mixture 
of error Feedback plus possibly clarifying informatioii. Ex- 
amples of such dialogs will be provided for each of our a- 

AJte7> Gfpftn: Automaiecf ranQio^e aoqula^tion 3446 
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pcdmcinal systems ia SDbs^^t ^cbtina, la Uii^ sectioa 
we describe the fotmolas undedying the most basic system, 
Qita ccMnmeiit upon hs properties and enmisiaiu. 

Let 5^ denote tbe ^di uset iiJat, th& activation 
yeax^ Enoduced by the network tcf, formoU (4)J, and 
<i^2) the cum companeot of the message. Let 

decott tbe machinifi rasponse after the /tfa iopnt mesiA^ 
Defiike ft tdtal acdvatiotn axzay at each sr$^ of die <fialog vU 



A^- ( 1 - a/M/- 1 + a/a( J/) +|cC J/) . 

In die simpLcsC ca&e (Gorin «f ol. l! 
ofinis of e( J/') to zBfD» except for 

Tte most ba&ic implememab 

3// » eo dud involves 

I'^r^^. Tic compotientft ,^ 

and the mocfaiDe's lespotnse aAcr tbej/'lh iiipiii involves the 
aetioQ Cfc^ ^vea by 



(8) 
(9) 

w$ set be compo- 
^ ( V> whwb is set to 
(OoriiierdL. 1991) sec 
avoage of dke teim& 
Ay fife denoCed , 



(10) 



Extexsiims. This b^ic algori 
extensions, to one cfirliest i 
ivinfezcemeQt feedback 



long dialogs into le- 
Gorfn, 1993), Itec 
F of Sec. m, and the 
aitd GoTin (I993ax) 
rdetoUs. 



I bss nndersaae seveial 
(Oorin ero/^ 1991), 
was providfid only by the 
Miller showed bov* to iirtenally generate reinfarcement 
feedback using confidence modelt (MiDcr and Oorin, 
1993a»c). For systsms with nmltidimensiooaJ fiction spaces, 
Milkr also deocnbcd how to fccus the leinfoicemeDt fec^ 
back On one or nmzre of the semantic piimitive actions 
(Miner and Gdin, 1993c). In systemi witli muttifiensory in- 
pax, Sankar ^ved hoftr to combine c«infoiGemeot from die 
user and cjivlronmcni (Saokar and C^oon, 1993). Ftor long 
dialogs wber^ a sequance of octtooa b deaiicd Sankax pre- 
sented 6n al£Odthm which segmsntJ^^ 
gions of stadonaiy intefU (S^nkar an^] 
issoes win be illastrated in tiie « 
iTTteiened mdor is cefened to NGlJe 
aftd Sankar and Gorin (1893) £br 1 

Caaversence, In evaJuolins a lanWiasB acqirisirion sys- 
tern, the priniaiy measure of perfo^ance is how often a 
device fespOods appmpriarpty to an input stimulufi. 
pwt, horwcvci; our systems to make dccasJonal errars^ espe- 
ciaUy when eDcounteiing Tin:^miiliar input, leading to an am- 
phasis on exwt dccccdcn« recover^ and learning, im 
pcrspectivB led os to furtber evftlUAte how many bicmctions 
are leqoized for the user and o^achinelio nefiOtiAte an appro- 
prfate nuichinc action, wh»h we call diatog imgth. In Gorin 
^aL 099 1), a simple dialog convergence model was pre- 
sentfid. It piedicts thai the probability of e dialog conveiglng 
within L inieraction& approves unity exponentially fast 
with L. £m ptri r«l evidence suppofing that convergence 
model was leported for d)e Alnianfib (Miller And OoWn, 
1993c) and Blocks World (Sankar andlGorin, 1993) systcnw. 

\\U APPLICAnON TO AUTOMATED CALL ROUTJNQ 

In this sectioA, wc dexribe expeipiencs which Involve 
ibe single-layer infonnaiion-xhearcric ^twork In particular, 
consider the task of Automaied Coll Routing in a telecom- 

3<47 J. A£*HRt 8oc. Am., Vol. 37. NO. 6, Jiune 1B95 



motticatiaQs network. The goal is thai when a person desires 
some semcB, he would dial a ain^le universal ixnaiber, «4iich 
prompts him with Nfith, haw may / heip you? He icspoi^ 
with unconstrained fluent speech, upooi which b^sb the can 
woghi be routed to apprppdaie desdnmion. This scenario 
can be CToxrssted wiih current iM«h~*« of pfOviding scpaiare 
Klephone numhezB for each service ts: of reqninng people to 
navigate a menu-driven aynena. In this new scenario, a caD 
would instead be swkcbed cq tibe basis of itr canxertL 

TTte department Store. Our first experiment in this di- 
rection involved a Depaftaient Store scenario (Gtmn etfO,, 
1991). There wetc ihrfic departmems; Fnraiture, QothiDg! 
and Haidwajie, plus a fourth call-acdon wbtnt the device 
"gives up" and connects the can tn an operazor. Perhaps tl» 
beat noetbod of expositaoo is to eiiaininfi several lllusDative 
hTunan/machine dialog. 

Scsskon One 

HeUa. this ts the J.C Belt Dcpartmef^ Stm. 

I need some paim for my rsdwood teUew 

/ willconntctyau to the hmdwarE depamtent? 

Okay 

Please hold 07k lean conneeHng you t& the fuirdwa^ 
department. 

In Session One, die device conwtly tmderstood the in- 
put, baving over dme leazned to balance the associations of 
«aWe and pom$. The second input. Okay, is a positive rein- 
foioemeat of the proposed maehinc action. 

Session IWo 

Hello, how rnay I hsip yoa? 
Vd Uicfr (0 buy an ecagEre. 

Afoyhe shoidd / connect )Wa* to the chthing ^epcnmcnt? 

No, it's a kind of f^mnitore. 

/ wm cawteei you tofiimitum department? 

Okay, ±at'3 better. 

Phase hold Cn^ lam conneciing ytm rPteJvntime 
depa/iment. 

In Session TVo, the device misunderstood the initial jo- 
quest because it has not previoosly eooourttered die word 
etagetTt. The seooiul i^ipuc involves the word w. as negative 
reinforcemeni, plos a clarifying mess^.^ The dialog dien 
convciigcd, leading to the appropriate call-wiion and network 
flHftpTfnion. The new v/oid is added to the input layer and all 
connectian welgltus are updated, as was described io Sec. I. 

There is a njdiujentary confidence model in dns cariy 
system, reflected in the machine's use of die word nujyte. 
Observe dm explicit positivn Tcinfofcemcnt is necessary be- 
fore die call was actually rtnited. Impiovemcnto to ttu^ btasic 
dialog connoUer were introduced in subseqoem systems, aa 
discussed later in dus section. A subsequent dialog in Session 
Three again invoKcs etagen, hut in a ^tiaereni context, den^- 
onstrating die acquisition of the new word and its meaning. 

Session Three 

Hallo, how may / help yom? 

I'm looking for On etagere. 

/ wHl connect you to thtjumitun deporrmeni? 

Okay, 

Altaft Gann: Ai/tomci«d lon^jagq ^cqulsAtoe 3447 
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FIG. 1. Vocskftihij ffOfwHi in [h* at^aruMat Gton oail fixity. 



neas0 toW on 3t am connecting yaum iht JUmUurt 

An ts^purimrai&l evftlvatiOA df the DLpftrtmeot Srxnc 
system witb kcyboaix) input vn& lepajt^dl in Gonn ft al 
0991) add with spakem inpiic in Coffa) ^fai (19944). W« 
postt«scB discussion of the speech system tcT Sec. VX, focus- 
ing toe on cbe sysiem with Keyboard inpU. Initially, Ihe 
systfiCD knovus nothing aboul ihe vcx:abvUEury for it& t&sk. It 
wGs provided with the concepts af word, pfaiase, and 
tcnce; but w isissBaiilsiiaa\^ thereof for the task. A word wa£ 
defined to be auy clianfl;iiir iJtjqucnce dfiUni(|fid by blAnka or 
pi]tDctuatiQD$, a phiw to be any a4jACent pair of words. The 
janftchjDe was initially provide^l with the words ni> ami olo)) 
plus rhftir ft^Mciadons ta neg^uve uui positive reinforoc 
menc icapcctivcly. An experiment was cundlucitut tn which 
12 users (collea^s at Moiray Hill) inieractpd with the sys- 
tem over a two momh period in a total of 1105 diaJofis, The 
vocabulary growth due to the input sentence: of each 
dialog is tJaowfi in Fig. 3. 

It is iLtu&Ludvr uj cAiumDt; die assocdftidn vector for 
^verai word^, as shown in Table I. To each word, there lis a 
thicc-componem vector compmii^ wnrri'? miiroal infor- 
macion with the call-actions: rouciog to the Ainnanac, Cloth- 
ing, or ffjaidwaie Dtparnnentfi Tnose *6rd4 ^ selected ior 
iUli^tratiotv, rather than in any panicDlaf order. 

As expected, ih* KtitvrA .rujAofer ho^ stron; pgaicive aaao- 
Qatioiu 10 cloihine, eka nagotiwxL Similiitly tha word ght^ 
hus strong po&Ibvii assuciauons to hardware. The word ne$d 
UlU^tcatss an interesting tuage pacieni« ^vhere wh«n someotie 
"needs*' soiTittldiig, iben il is more liJcely ip be harrtwj^re. 



Table I. Newark UADGsad^ in Uic dcf^uTuncni 



CLUE 
MOtKEK 



-2.33 
-3.17 

-1.58 



-i>i.i4 

-J.17 
-OiO 
-^M9 
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-M.34 
+0.50 



vwBfvfitn iy onoinn HOptwcor seniteas ByarlWBiil 




*CD fOQ WQ 



1000 



FIG 4. VbcilbuFaiy growth in the cpcrator s&fvk«& call 



The ftn£l exaniple in lUite I was brought to our dTttuttOD 
durJoA « laboralory demoii6inia<Ju, whtsa a vlaitW provided 
the iJiput / nefirf a birthday present far my moiher, whence 
the mfldiine amfidemly pffered to connect us to th« elnrhiog 
d^parrnieat. In lespai^se t» tb6 accuGation of co(DstnLctiD£ a 
poBticaUy incoirect device, we could onlv reapo»d that cEus 
was not preprognimmed: the machine is jost a piodtici of its 
eDvironinent 

Operator Services. Based od the above expeximem, we 
became inicTDsccd in how luch methods woul J upply to r^- 
wgrld Ouia. Tb tih end, a smaU speech daabase was col- 
lected of actual eiKtomef/operator traiisftttaons in the AT&T 
network. The eustotner inpiu nrrbograpihicalty mo- 
Ecribad and labelBd with on* of £l cftU-eictioiia. Tfc vocabu- 
lary growm over the cout^ of 1 140 input £eneexK:es is shown 
IB Fig. 4, growing to 9n words. Some of flicse wortis aic 
significaot for the taste, others not A da$ous£ion of this i^sue 
is postponed tn $«. V, Where W5 quantify the nocicwi of 
salience and include ttbltt of the mo&i idlitstn woids far tbl$ 
sind other lasics. 

To cacb ward* there cowespoiKU a 21-componftrtf Hs:a>. 
ciaiinn vf.rttyi, Orw interesdof example is the word HOME, 
which U sirongly as&ociaial with ihK call-acoori 6f tiw^- 
number billing, typically embedded in 4A input such as / 
WOAl fd thikr^e this w my homs ph/>nH pl^n.<st» Anoxher e:^ 

ample is fhP. WQird CffAi?C£, Which iC positively n^BtynatrH 

te the two can-action* ctf thiid-uuiubta "dnd ordtt-caWI bill- 
ing. 

Sankar reported on preliminary experimencs is itk&pping 
these transeripcioiis to c&LI-aetions (Sankar et al, 1993), An 
on-line convcrsaifDnal-mode system tias been constructed for 
this tAsk, initiAlly witli keyboonl mpur [Miner Oonn. 
1093a). then wiOi spcjteci input (Gorin erci, 1994a) using 
the methods of Rose 0993). The dialos contmllftr |s im- 
proved over iKat the Depajtment Store system, exploiting 
a ccmfidcDcc model as Qlustiaied below. Di Session One, the 
machine «ncoDnter« an ambiguoud input, which is resolved 
via Teiitforcement feedback plias clari^ing input, in Sessien 

Alten Gtyin: ^omats^ language acquieifion 344d 
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Two, Tins maebme has high confidence in its oodexBtandiag of 
the ^nvprtwiteinff directly to tb& Uu-wtioji wiOumt watt- 
iog for cxplidt cxcetDBl coD^onatia^. 

SesslMi toe 

ffcDo, may I help you? 
I'wBnt tft chaise this call 

No, to pat it <m my cxtdit cqixL 

want TO charge this caU j lOixr cwA'r cw ni 
PUoic enter yiwr Rum^: 

Neih^ how may 1 k«Ip you? 
I need au fimeigBncy operator ri^t now. 
Piease held cn \»fhiie i connect i^ou to jOn aperatoi: 

We cojiduda this sectiofi by cxm tmentiDg on oth^ tasks 
mvohong sortm^ aatival lango^e imo dosBdft nr M n? Therf 
have been sewci^ leceat experiments 
ficAtioa Chnii yp^h (McI>oncu£& 
{RoWiwfc 1992) (Rose ei al, 



invQiviDg topic identi- 
elal, 1994) (Feskin, 
1991), The task of son- 



acd loutio^ icit data httst haem nddressed by inaoy re- 
searchers (Belkb and Citift. 1992; ]Geuiiier ^aJL, 1993), 
These tasks differ tein ihe Call RDudng application in isev- 
eral dnn^isaons. niat* in Routing, the inpoc is provided 
by a cooperative user who detira; to >e ucdentood and who 
is cognizant of die g^ijioal lange of caU-acticsns. £ecand» the 
input 13 typkatly one .semenoe, i Mifat-r I 
or cqnveisadods, Tluixi, tliere i& die 
with the user, urving tn n&«lve amt^^tieft' ftnd xni&mtde^ 
standings' 



than whole pan^phS 
oppQftunity for dialog 



IV. STHUCTURBO NCTWORks 



As a device and its edvironmeitf 
doea th6 mappiz^ hjam. input stimuli 

provide lapd learning s^nd 

qai5iQon [tevice, we hAve proposed 
W0(dc5 which icHcLt tht: device's 
ftnvmimeiiL The netwoit micture 
scraiats cm the m^ipiDg trnin srixvuyi 
greatly »mlsmiDg the leomiqig 
several mcTliodii fur onuticcilng 

these \&ss& in the Alaianac and BlncVi 
fioonnnj f»d in itis Alftina Infonoadati 
SecVU. 



4. PKM|u«>t noiworkft and tho 

^ the Call Routing ex{)enmf>mn 
Aft let of ouwfaiiio Qotiona comprisos 
investigated the situation where the 
un iNpaonuueier faimiy or suhrootines 
1993cX The individual selection of p—^^ 
noted .vMuxnfic pnmiti\'0 mnioits far 

r^metei? affr comnioa Co oil the 

^ace i& isomorpbic fo the Cftrtesi^n ^ 
primitive actions. Mfller then proposes 

product netwodc. whsie individual 

d^B election of eacii semantic pdmiti 
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Dedme mote cans^ex^ 
tn 4CtiOil. In Older to 

utilizing stnictuf&d nel- 
iQpupou^ttt penpbecy and 
ftiqctifsnal oon- 
to action, tbtnoh/ 
Wc have dervelupt^ 
structured nctwodca 
evaluadng 
^rld systezzu w this 
ayatcm discussed in 



piovide£ 1 

EUllli 

;proo&^. 

»tpcnii:eotaDv 



Almanac system 

the picviouc sectioni 
a dioiptc LisL MlUer 
of actions compiiseo 
(Miller and Godn, 

values are 
dcvict. If the M par 
;. than the acbCHi 
pijxluct of the semantic 
the constructian of a 
are assjigced to 
vaiue. 



pacainotcr ^ 

tlJc 
suhtotldjies, 



networks s 




HQ. 3. TroduetmcwasfciafhcAtmasacdmRtvm] sysu. 

A twiwhmensiona] product twork wa£ ^vahiated cn so 
Almanac djtta reorievai task. In partlcal^ device .knows 
20 f actt cQiiccLiung of the SO States <in the U.SlA,), and 
is capable of retnevii^ any of these 1000 facis. fkx eaon^ 
jpven the input What £c thf. fxpiuti cfN<!w Jmvey?, theo OA 
^propriffte j^spcmse would be The capital afVe*^ imey U 
7>ent<m. As IHusirttBd in Fig. 5, two infcrmaiioft-theofetic 
networks axc combined in a product octworfc for ihis tasJr. 
The input is applied to both nafmorfc^ in*i.^n.wiy t^g. 
re&ultont ictavation arrays combined in an omer sum as foV- 
rows. Denote by {fljCn)) l«n^20 tbe activqdm fiftay pro- 
duced by die aRnfam]&-8elechon subtietwcsl:. by [a^im)) 
1 «m«50 die fiway piodu«dd by the >tetfr«kction subnet- 
work. Let a demnrefheontttrsumirf ewoveamB.produoni 
a matdx whose entries are pair-wise sums of the vector com- 
ponents. Li particular, we compuie the 20 by 50 activatkm 
matrixVih 



(li) 



The action is then perfonn&d conespantfing to d» ordered 
pair widi mnxitmim activaiiuii. Wv, reraarfe diar &r jq^icess- 
iog the QcsT Btttzme in a dialog, it would suffice to seikct 
the maxima separately for each dijnen.<nnT»- deoiled m 
CMiDftr Bivl Oofia, I993c>, howevctf, the matrix ttpnacatBf 
dOQ booorocs useful in lotct siage^ of the dSalot. where one 
UlU&t iiiu^t KgaiiV6 i^Amfoicemeiii and estimate con6* 
dencc models. 

We daftorihe this systfia via Borvorai ilErotmrivc dialogs^ 
ftfi foUowfi, 
Session Ow 

HhLLO. 1 KNOW FACTS ABOUT THE FIFTY 

STATES. WHAT DO YOl.T WANT TO 
ffaw ^ is iohta? 

WHAT DO YOU WANT TO KNO^V ABOUT IOWA? 
Hcfw much iand Er there? 

IOWA'S TOTAL LAND AREA IN SQUARE MILES IS 
Session I^o 

WHAT ET .^p. XK>YQV WANT TO KNOW? 

y0u htaw kaw big New Jb>* is? 
DO YOU WANT TO KNOW THE TOTAL LAND 
AREA OF NEW YORK? 

Allen Qorfn: Automaiea language acquis/Con U9a 



PACE 26/38 * RCVD AT 7/7/2005 2:17:58 PM [Eastern Daylight Time] * 8VR:U&PTO-EFXRF-1/5 * DNI8:8729306 « C8ID: 1-410-510-1433 * DURATION (nfun-ss):31-30 



To: USPTO. Page 27 of 38 



2005-07-07 18:18:49 (GMT) 



1-410-510-1433 From: Thomas M. Isaacson 



09/23/2003 



11:22 



NO. 343 



Cll 



tneanipg 
I enabled 



NEW YORK'S TOTAL LAND 
NmJES IS ^9J0B. 

In Session One; the machine was 
pfaraas Hcfw big is... Keigtibve Tcinfbrcemeot 
imemaUy via a coafi&nce model, leading 
ccapooding via a paniaJ pioin|it for tbe 
quem dialog was fhown in SesMon 7\»o, 
acqolsUion of those words and their 
Observe tbact Che product netwodc has 
of tbSil meanmg to a new atace (New 
explicit supecvision provided AS to Vkln^ 
fiodaled to n^faich semaatic pniinrdve actioi^ 
traic how djalog ooncrOE can lesolve an 

Session Three 

WHAT ELSE DO YOU WANT TO Kh 

What's the main r.iiy in Calf^ia? 
to YOU WANTTD KNOW 

CALIFORNflA? 
Ato, its capiiaL 

DO YOU WANT TO KNOW THE 
CALIFORNIA? . 
Krj. 

SACRAMENTO IS TEffi 
SessffMi Foot 

WHAt; ELSE DO YOU WANT TO 
Whati the main ci^ in Texax f 
DO YOU WANT TO KNOW THE 
OF TEXAS? 

AUSTIN Vs THE CAPHAL OF TEXA$, 

lo Se^ian ThD&e, th& machiiie 
of the ambiguous phme What's th£ mmn 
5<^TKI iopui. ths ufifiT pmofvidftd aeeativB ceinic 

plus clanfyiDfi infoimation. Observe that ths. 
foi whether the user U reje<4ing its 
stale or attribute value. Sesskna FoEir 
sidon of meieoicg for that phra<;R And Im 
di^fereof Gme. 

Au ejLptfrinreiital evflluadOQ was conducted 
did system with keyboard input. iavoMug 
two week period in a f^f 1 01 ft Hjalogf 
li9$3o}i Figure 6 shows the vocat>uLary 
initial iupUC !»ailiaiuisi ovtsr Lbb COUI^ <tf 



ARE^ IN SQUARE 



uof^liar with t)^ 
wat generated 
10 the machine 
at(!ribute. A suh^e' 
dmonstr^ag tbe 

for this taiic. 
geaeraliKetLon 
, although no 
wards aieas- 
t We now illus- 
amlf[gtioii3 inpm. 

;ow? 

THE LARbEST CTTY OF 



CAPriALOP 



CaHXAL of CALIFORNIA- 



r demonstrates 



B. Sensory pilmmve fiubnul works atiii the 
World 



apprcnviate 
I input 



In many sinmuons of intenest, the 
icapwise dcpeadB noi OnLy oq the spoten 
state of the cnvirofiaeniL We aie thus motivated 

devices with mulii sensory inpir, w^iich 
ping from spokep input pUts the stace of 
appropriate machioe acOon, 

Cou^der a ivbotjc device cQanmaQded 
blue cube. The appropriate maehioe action 
where the blue cube is actually located, whi^h 
how be scncftd. For a second example. 



tUe 



consider 
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KNOW? 



CAWTAL 



\ mkundet »0«d ChK intent 
City ... . In rhft 
otvomoTii ( ni?), 
ddviee decides 
undepUadlDg ot the 
the acqid- 
to a 



of the Alma- 
3 osets over a 
(Miller and Corio» 
groyvth due ti> die 
dialog. 



n^chine 
but upon 
bo invesd- 
IcaiT) the map-, 
world to an 



then 



0 fick up the 
depends on 
JttuSt some- 
the AntO' 




FIG. 6. Voeafauiary ^rowtti in ihs A£ikuik 



mated Call Rootifig tasJc de£cribed in See. I. For ao ii^pat 
such as H^^tpJ This u an tm^r^zncy!^ the appropriaie caD- 
destinadon depends ofl the physical location of the talfipliono 
frnrp wUch the- Call wu imtiatied. A thiid exampla, from n 
ttlcconftEcndng control ta^iX 3ui;h va Huiziaiiei (Flaiu^ao 
€t 1991X is tbe ^Icen cooimand lights, plMse^ to which 
the appropriate actioo depends on whf>rher t)v*. Ughis aie car- 
ta^y OD or off , 

Sankar investigated adaptive language aoqukidcm in a 
muldsensory Blocks HbHW (SaiUcai and Goria 1993). The 
mAChine wea presented widi & simnJamd ViSttel tcene, coo- 
Uiniog dr^OFal objoctfi of different colon and shapes, ^n ir^ 
Kpunse tu ijipui such as Wh^ isthertd sipton?, the device 
demonscrate4 Ita Dndentanding via focusing itf '^eyeball** on 
the appropriate object The a:rinn space of thir device is 
paraznaterizMl by a two-dimcosionAl oomuiamn of eyisbuU 
ooordioates. jpedfying th£ device* s visuatjbcus ofan^witiQru 

'the device was provided with several innate charaoer- 
kdc3» tmplArriftnfPd via a tizn^vwying potentia] loiictioiL 
Fii% ii can senile the color and shape of tlie objeuu; in its 
visual scene aod it i& aitcmrted to bright or moving Dbfects. 
Second, after focusing on soma object, it VrnmnK^a b«xed and 
\\% flitiraction to that object dimimshes over dma Third, it 
4ho con&Ci^ts mtwuktiuns between Uitftiisxic and visuaJ 
events that ccmkcut len^orally. and i3 then attracted to 
Jeci£ wtiicb are strongly ai^ftociatAd with irs linfuisdc inpuL 

SanJcor propoxd the constrMctioo of Mtuoty ^ImUive 
suimeUiOrks which learn associations between the fin^dsljc 
and vi&ual sensocy inputs, implementing the third character- 
istic Jkbove. The output of dicae anbnetwcrks are ttai com- 
hifwvi vift a product network, yioldiog a time varying potco- 
CiftJ function over the visual scene. The eyebaU xnotloo is 
then govemfid by direcdng it towaxds the mtnimuzn of that 
potential fuociion. Figure 7[a)>^(d) illustrate a sequence of 
interaction $ with the systeoa, where it acquires the meaning 
of the word cwcfe. This canvers&tioAa! mode ayaem, with 
keyboard input, wa* evaluated via interaction witti eleven 

Men GeffA: Automalsd languasia acquishlvk 3450 
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b. 




users in over JOOD unccuurtn^TnAH njih 
A^^Tihog ft VMiibalAiy of 131 wonfo. 

on the principle ttiai Uuiguuui! ucquisiiaoii t&r 
cvraplcx devices should proceed id devplopraeflcal stages, we 
observe thai visuai focos of attention ii a prmquuits step to 
devices whkh arrtnany manipulsie th« otojeciK in thffir field of 
vi»w. HoDiD icpom oa firi fixiduiou odsuuluu's viipiiilrofiAi, 
cantiftciing ir to a robodc 9inkuUcor aijcf (temoiiscrating lan- 
guage acquisitLOn far several mampulijtffiy actions with iix 
sensory inj>tti diwmels (Hcnis etal, 1994). 



^) JYnsuage diaJoge, V. MEANrNGi SAUENCE, AND IMFORNIATtON 



metiiodfi feu buikiing 
subnctwortcs. which 



In this sfiodon we hnvc discuited 

sthictuied ne works from component 

have been exporiinentafly evaJuoted in w Almanac And a 
BJocfcs WODTid System. We now invesigate hosw lo exploit 
such a nfctwork of afisociations to quaJtSy the meaning sad 
infoiitistion content of bngitaga. 

3451 J. Acoust Soc. Am, Vol- «T. No. a jline 1955 



Wfe have bt&n iuvesUgattng ileviccs whicli learn to im- 
dersiand and act upon spoken inpfOL Th* idriniatei goal of 
these Kpaech uTm^t^fftmiing syfrema is to oirtraci maiimg 
from the speech signal. A ciwcial LiAje in engineering such 
devices is U <iaaBtify the. miocmatioA ca(mecit of spoJun 
naiural lanpiase;, ihcu to measure a machinft'ft siwvkq in 
extracting thai infonmtion 

While informaQon theory is a well-developed dudplusev 
dmre Is thfe ii^odard cav^E thai its noibii of Infomi^on is 
qiiite difierent from a UymanV In codtrftst, for sya tenia 
which npri^rtuuid spoken language, the smansJc Aspects of 
communicatioo mpfimary. In dii* section, we piDptose pito- 
Ciples and medtianisns for quandfying the senkantic at- 
tributes of ionguag^. In Ihi& and later secuons, we dis£u$$ ihe 

AQon Go/fTt: Automatsd tanguags aoquteiSon 3451 
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implicaiians of theae methods fur voeahtkiy seleetiaii In 
wGR^ponin£, scqiQ£itio2i find adaptatioD of Atw spolcen 
wards, £XiUlkmstiGal inference ami rob\ist paniag. 

A. Senmntic/^nsoiY ossoolotions 

Far people, an iagm stimulus evt^zR i jctnoriea of ass[>- 
dmed peicqptions and aciiviiies. We i hus oiodvalBd to 
tiefine d» meaning, of a word, for a partic ilar dovice, to he 
its ntrwork as^iatwn^ to the dcviic's iiiput/oinpat 
periphery. Socti a definitioii grmiDdfi meaiuiig in a d^ce*s 
bilertctaon with its warld, being dependioit on its inpat/ 
oiiQnit pCTiphfliy. cnvinannenl and expetialtea. We now de- 
scribe illusirative exsmpto of this conDeciicmsi represeata- 
doa of roeaouig ia seweral ea^rimeotal $y ^ceita. 

In tftc siiDplest ca^e of a angle-layir network, each 
woni-node b oonDected to cacb ontput nods, so thai iB nst- 
work assQciaiiaos comprise an M wUmmmi nj ^ vector (wboie 
/I b the munber of output acdons for die device). For the 
Depanment StocB Call Router described in Sec. IDC timrc is a 
threc-compODem vector for each word, as was iUusuated in 
T^ls L For due Pperatortovices CaD Router, aUo imro- 
duced ID Sec HI, lt» nctwodc assodanona comprise a 21- 
component veetctr. An iUuatrative e;xample mm tbat task is 
Iht assoctaun vector for the word char^F wmch.is most 
strongly a saodnrwl witfi tftirtf-nwriiwr *tfi[nk has mUd asso- 
datians to crtdit-card bUlmg, and inhihi^ of die othet^ 
call-octiDiis. 

Miller^ Almanac data iBCiieval systsxk k based on » 
product network arehiiecttme. as was describted in Sec* IV In 
du£ case, d^ associadoits between words Und acticais aze 
fraiinErt, being factDred dnough the semantic primitives of 
attribute fliHl state-sBlecdoa. For a pradntt iwrwodc* tbe 
woid/action netwotk associatjont ^ foUy detenninedby tbc 
aisodations between wards gntf primitives An illusirativB 
example from diat sytiem is the word Cclot^, lis ussocia- 
ticms wl thin the stateHseJoctian subnetwork a re quite predict- 
able, being ttxong^ a&sociated to oxte state 2u id highly inhibi- 
lory Of flie ot&eis. White one mifiht cicpect Colorado to be 
nuU for the auribute^lection subnet. U turns out to be mod- 
erately associated to requesis for higbeat mO'ttntains (not sur- 
prising, ia retrospect). For Oz fani, it is interesting tf> ob- 
serve die strong association bctwean the word Domhy and 
queries about Kansas. We reca]Z that these aisociations were 
acquirea by tbe device during its intcractioid widi many us- 
ers over a period of time. 

Ia Sankar's Blocks WoiM experiment, th src ar? netwoik 
asfiociadonfi hetwaen n wocd and the visual , nput periphery, 
facmred Uuough die color and shape sensory primkives. iL 
Hesiis^s exfenaioii of tbai Blocks World* the bs^odadons in- 
volve six YiiuaJ features and three machina actions. Ore 
could aigue diat as a device's input/outpui pmipbeiy be- 
com^ mote anihropomorphic. so wiU diis npTBsantoiion of 
meaning. This will ntmaia a coiyeciure, ho^ ircver, until wc 
can constract sufficiently complBx devices to i est the hypoth- 
esia. 

6. SaHence 

Given the cepceseotation of meaning via o&cwoik asso- 
ciadoD vectors, ihem a seman^ distortion nteasnre can be 

3*52 SocAm..Vbf.97,|,ao.6. Jun8l9J5 



introduced between such vectors. A mOl word fe a 4tevice is 
one whose nstW(Kk associstiona are all aem. The saU^itcc of 
a word for ihac device can then be defined via its distoitimi 
fcom the nun woid, thui proviifing a **ncBm** on the network 
assoctationa vectors. 

now focua our sneaition on a ringto-laycr 
htformatian-dieOTetic network aod quantify diese iotoitiDni. 
Id this case, the network associatiana of a maid comprise a 
vector of mutual infomatkms between dial wo<d ani die 
various machioe actioiw. There are many possiUe distance 
meaaurea djat one couJd explore, bat it i£ advaut^^us to 
exploit die informadoiv-tbeoieiic nature of die vectors Gives 
a word, V, dcDOie its network asswdatum vectoo- as 
(XCv^Ci)) where (- -> denotes a vector whoso itth eompo- 
uent is /(t/,Ci). We dedne a semantic dlstoxtioa msasote 
tforCOj^Da) between two words i?, andpj via fizat compnti^g 
the difBcienoe between these vectors, then umvertine that 
dzfiferenca inm a scalar via projection onto aooie vector 
Denote Wi=</(v, .c*)) and w^=(T(vz,c^). theade6o« 



(12) 



In particular, if Tve select 5 to be the vector off /losterfoH 
probabUities Cc*|ui)>, ibcn 

In addition to its geometric intrapretadon as the scalar pro- 
jection of a vectoT^diSfenence, this distQitioa also has an 
infonnatiott-theoredc iateiprBtati<w. It cnn be easily seen that 
formula 13 is equivalent to the ICuHback-Leibler di^^nce 
{aJc.a. relative entropy) between tbe a posteriori disnlbn- 
tions (P{£i\vx)) and (PCc^luj)) (Cover and Tliomaa, 1991). 
That ia, the semantic disiortioii between the Cwo words is 
cquivaleni to the distance between the di^tributiona dial they 
induce on die oetwork^S perfphery.^ 

Recall tbat a null woid, v^iy ia one whose associadon 
vector IS all zeros. The st^ienc^ of a Wotd for a given device 
is defined as iis samantio dtstoctioo from , 



It was shown by Blachman diat this is the unique noq- 
Jieff^aive measttfe Of how much informatian a value of one 
random variable provides about a secaiid one (Blachman, 
1968). Thflt U. de^otiag tbe random variable of machinB ac- 
tions by C. then 5al(y) is a measure of how mudi infbin^- 
tioD die word i; provides about C. Thus, aaUenoe piovidea on 
infarmaiiDa'theDpcilc measoie of tow meaningfnl a wmd k 
for a parti^iilflr deviee. After deecribing a few cjwmplcs af 
dds measure, we wiH cmptfically disdnguish it from Sban- 
non'a measuze of the infarmation content of a wurd. 

UlustmtVd eiamples of die most salient vifoids for die 
Operamr Servicea task art given ia Table TL The primary 
call-acdon is the one with meximum association to that 
woid. In dwat cases where there is a moderately strong sec- 
ondaiy association, then it is also iacluded. The moat salieni 
word, NOili/B, almost ahways occurs in a ringle csai-actLon, 
for example in phrases such as bin thts io my home phone, 

Allan QofiA: Autgmateo tangbaQfi acCfuiS'tion 345i2 
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tABl^ IL SiUent wonts in the eparaite- sm/^ ttsk. 





SaUcocc 


HOME 


157 






CKEurr 


2.19 


CHAiEfiE 






L70 


AT&T 


1^ 


CQDB 


IJU& 


PHONE 




CALUKO 




MY 


i.n 






FROM 





CAliing-cjutt I 



rtjtDIne 
r lining 



Saokar sltowed lihat one csa 
tbis Cam fUnj^Sa^ task via 
the teTinectiDa weights oi low- 
Issrcjul, £91^), Id ttiai expei 
widglit-clippin^ AAUAlly improvelt 
fttm tcazisctipdoQs. We conjectun > 
tfae iiicreaAeil vi^iijuicfr of tiie 
salieooe wocicfa, Mott neceatly, -sn: 

cQDtmiioua speech reco&nizer 
ini£bt also canitii^ Rvploiting tuc\ 
vmhvlaiy of 4 speech rccogviizcr 

(McDteoagb itad Uisb, 1994) 
main subjects tor future mseaitii. 

The ppwiQiu example was 
hyw netwoik. For devices wiili 
diiieccines ana peiipb«ie;&, £Ucb a! 
Workt» w& caa scpaiatchy measure 
tsach of tMir «i>mniitie Aad sw&qt^ 
Table m coatmns iomc sallciit 
Aclcctioii Kubnctwodc of (be AlmaoA & 
SAttt^ cf the most £gllent words foi 



sal ienofi 



Wickname 

SONG 
ELCVATiaN 

W?IEK 
INDUSTRY 

WHO 

BECrUMP 
TAKE 

UCADINQ 
LARCfiST 



34m J-AoousLSoc Afi).. WJ. 97. jiinaifiQe 



Sccoadafy eflU-Bction 



BToog^diiKiber credit 
ualliDg-cBti HJiHtj 
caQin0<«rd loUmg 
c^iBf-c*rd biUlng 

tflirt- TMTTihwr luiypg 



urlea a subvcicahulxuy for 
_ i.e., clipping 
wards to xfiTD (Sod- 
It, it wHis shnum Thdt i&ii 
tfae imde fmnrtirtff r^i> 
(hat diis i^jtuJc Sk diir to 
ostimMca for lo^ 
bave reponied 00 dikalo- 
to the output of a 
eta]., lOQdh) Onf^ 
oaetbodfi £or redRcimg tbo 
bi (liibir lu pnxliice 9 
navA beea addmssed in 
tftflZ, l^) but ra^ 



issu^ 
(Peakin 



for 



a device with « dnfle- 
cosiiilex necwodc ar- 
the Abnanac and Blocks 
the saTtenftft Af unt^s fotf 
pri9UtiV6 auknctwoEtfcs. 
yvurik Air tbe ffarftune* 
syBtcm. lUble IV ^ows 

th6 cotor AftTunry pciioi- 



4.3P 
4.1« 

'1.D3 

3.4« 
3J? 
J.3S 
J.27 
J.]) 

/.da 



TABl^tV Moa saiiiatwonis ftar the color ScsiiryjriittidvB. 



VBLUOW 
BLUE 

UME 
AZUL 
HAfLA 

SECY 

MA(UX)H 
BLOOD 

jauke 

GRASSY 
LAL 
FfLA 
5QDUI 

IteKLA 

crjmsen 



136 
1^ 
1^1 
1.70 
L64 
L50 

1.44 

137 
1 ?S 
133 
1-34 
1^ 
1^ 

i:zs 

(L97 



tivc suboetWOTlt ctf the B»fK*fi W«a systeai whem 
Cilin^ual comtttut h sizikiiig. 



C. SafteRU Vdr^s InformaUon 

biLtlly (lUtinguJab osd coin|vw the Dodoiis of sa- 
lience ana iifftonatlOtt. Xh^ ^m/vpy oi a laoguase uteasuiea 
hoifr much infonnation is pradimed. mi ttie- Aver^flt, for eacb 
symbol in th« language (Shaanon. 1^). For lh& sate of 
«tposliton, let us consider ungle wunls ai a ifme, ignoring 
i&Stics iovulving conGlatiaufi betiveeQ adjacent wwtls- Thft 
intonDaticiii cautcnt of a woid is defined as 

i(i>>=-)022/KuJ. (15) 

wbow Avongc ovtn aJi words is an approumatioii of die 
language eniropy. Bom «ilrapy and are zneasuMKd in hits 
per i4r^dL 

Wft firrt obsftKve tint tbasa defijiitions involve onlj ibe 
laDguagc itaolf. For example, given ^ awrf f ^^i^ m ibe 
OTgmal jRassian, ond bould compute tbe mfonnaticn] oantertt 
« individual words and the entropy of tha Inn^a^r, wittxHit 
c^ver BndKTxtflTiHrog » worf. In conmiiJ;, computing aaIbcuci; 
mvoWcif^ botb the langwftsci mid Us cixira-lfcguistic assom- 
Uujis uj a device's cnvSiaiinKnL 

It i5 miislTadvcto cmpmcaUy quaialfy this riiKtimrion in 
fte Qptotor ServirP? Task (rf. S«m£od JD), Pint, com- 
pute /(v) for each of ifac wotda, iising reLiiiwe ihsqutncy 
csrimwcs of diBir probaWUlies. TV Oistablrtion of Ihe h 
Shown iB Hg. 8, pJbtted oa a log scale. Tbe neai-linear fm,„ 
of this distribudon is not capridnuj:, attd can be rcleiad 
(Gorin 10 SSpf'a law (Zipf. 1^4$), SiijiiUrly 

WB also compute salCy) fai euub word, wbo*e distribution is 
shown In Pig. 9. A SMttcr plot of salience versus urfttnnation 
content for dvis dacabssfr is shown in Fig Ir) Jl ifi clear that 
whife tttftfiR tnrasunrs are quits diffcrcrt, there are relailuii- 
sbip« boEweea the two that are refleciiui in the smictui* 6f the 
scatai- yhjt CuaracteiteaciCkn and eKpJoitation erf this struc- 
ture is tt tubject for fbtuie resaaccb. 

Auen Csorin: Automated (a^vguaga aoquisltkan 34S3 
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FEj. a. IntbniBtimi isQtnic ill d>e opmior soviirs 



Smmary Li diie sAcdan, wd bave ^xplcn^d how to cfe- 
fioe meaning via semAntic/sessoTy flS£0£ULi6ii4. then quanti- 
fied ffucL inmUiGiia vi« a saUAiMu tbeory. In tM 
wc wfl] cximnieiit u^a>u buw uuu nii^ txpliiii Ealitmce la ttB 
^uisiiiozi and adaptation of spoken words. 

Vi EXPERIMENTS (N SPOKBH LANQUM^E 
ACQUISmON 

III tntdidcnaL speec)) undersundiAg Ay^tems 
ancl structure m tbe linguisdc hieararchy are ' 
For humans, however, the itymhniK of IfUs. 
em«Xgr DatnraUy dvkrlne the sDiirao of q otiilc ' 
with his CTTTinmincut (KuU, 1992X Tbis turtiL 
05 CO IttVfriti^stB liow to oumic such bebavicr 
chiTlPS, thta haw to exploit it CO bcprove ' 
adaptability, and rrihijRfn^w cToiir Spmb unde^sloodirVf 

lu Lhis wcdoiu we repon <yn £Vperiinait£ 

t ^rmatf t fl acqmsition of spoken woids, whidr 



1-S p J 



voc*UJary. 



flG. 9. SAficncc io iba otpo^tor ficrvioaa 
34S4 J. Amust Sot Am., V6t 97. No. 6. June 199S 



> the symtelB 
a priori. 
In&gibge seem u 
'3 interaction 
;i modvaTK 
in our na^ 



cht fuDda- 



5 

AS 

4' 
1 

O 



no. ID. Satwflc* ^ 
uiliuLujf. 



memal unit of iii£mu)^ in lan^uagCw Al&cic jiidixncntary. tbe 
systems are unique in tZiat there is w text utilized Murine 
iraining or evaluation, in crmrm^ w\t\\ ff^\ ctUm* ffpoken laa- 
^uage systems (Levuwon and ShipL&y, 1980) (FS«wini, 
l9D2i (RabincTiuia JUaii^. 19913) {Waid, 1991) CZne. 1992).* 
Oik cxpenmedts arc ol^o unique in that tbe vocabulary 
words flUd their meai)ins;s ^ acxjiured aiitomatieany, m con- 
trast Dct ^1 ottier systems wlwre U» salient vocq^mJaIy js pre- 
defined* 

Jht Vehicles for these tpeecfa experiments are the Ibe- 
paitiDcm StptB and Almanac laslcK, witich wRre mtrrvf^iced as 
fcoyboflid-iia^ed cyfittOiff iu in ted IV naopootiv&ly. In 
eouh oa3o» the input wa5 constnUcied to s&qocutJA ur isulaitid 
^poJc&n words. task, we bad incviousLy reconkd 

many keyboard dtalc«s bam multiple users for each. TWa 
toitial iaput fxozn each dialng vnx. eujii recmded by a 
Bifl^lt spcaknr in an office cnvixonmcnt Ibtte, uoiciboccs 
gdus thcii couts^pumliDif semaiiLic acdonfi prnvitied tbA d^tta- 

for these, expenrocnts. In the case of tha Depamneoi 
Store the resultant network w2Si i^hcwlHAff biifk a 
M«iveF^^tiDOta-in«i« systam with speech uipuc and output, 
which acquirtd new words and ^ulupied Jcraiwn ones dunng 
the couisfi of pejldrmin^ its task. 

A. New ivord ecauisition 

Tlie un&i^£6s were segmamed Into individual word to- 
kens via liteir energy contour (WllpAn jft al , 1f)^), Ftanzn 
*«irt<Mion «Mnprifi#d 12 oopstml aztd li, dd»-ccpatral cocf- 
ftcTcntj at lO-rm ioteivAls fftabima ti al, 1939). AA dcalled 
In (Gorto tft a/. , 19^), theic are two stages in the incmmert- 
lal irainlng algorithm for each ^uttenwcsACUOi^ pair. Risi la 
an adaptive clncfArin^ of the new wanj lokeru into existinfi 
woid-nodet, jKWfiibly ciBaiing one or more new vrort-nodes. 
Word tokens v/tiv «:umpcmKJ via a Dynamic Tuoc Waiping 
(DTW) measiin: (Itakura, 1975) with a local Euclidean dis- 
tance. 5econd» the tnfonziatiQn^heoretic cnnTiRnMoD weighu 
are updated via die mathods of Sto. I, 

The wonl-nodfts in these expeiimems are reprcaented via 
a cluster of spoken word tokens, provkUng the device with 

Allen Gflrtn: AuTomated lan^^e aoquis'tlon 3454 
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NtiiMKOfUtlef man 



FIG, II. AoDDstic vQcabulaiy gn «vtb «i di* AbnaiBo 



m acqoiicd acousfic vocaindary, 
the System, it is ninminadsg to 
cabnIaTy wiih ilie conespoiKliDg 
two IdiKts of "enors*' chat occur at 
of enor is when word-cok:ciis with 
merged into a single acousdc ctusier 
wcrd-tokeos with the same onhogrt^y 
me acoQScic dusters. 

Figure 11 illumtes rht acouBtilt 
the Abnanac cxpeiimesx (MiUcr 
paxed nidi thft caiTBspoDdiiig lact-^ 

clBHT &mn mi3 plot that the 
teaa for this particuter plustciiiig 
scttiQgs. The sazne simaiioa is 
paitzneiii Srome c^paixncat (Gona fii 

We quantify the split phetKunliii 
uumbcr vt ucinistic cluscers p^r 
granuned in Rg. 12 f« (he Almanac 
worlds ai* in only ooeartwodufiieis 

OtetiTVod for words sLu^ as a. f/i£^ in ; 



no text is pnovided CD 
tttu acoQSdc vc- 
vocabulaiy. Ttoe are 
this level. Tlw first lype 
^(fcreni orthography are 
Tbe secsm is t^hcn 
Bxe bltd sepa- 



1^ 

1»- 



1 



D 







V £ 4 e ft 

FKj. li. Oi£tribtuti«D of split won)5 ik U» Abmnec task. 
34SS J. Aoou&t Soc. Am., Vo». 97, No. fiJ June iggg 



100O 



voeftbulary growth for 
Gojin, 1993h). com- 
-^^ocahalai>' growth. U is 
Split ph swusdnoo doTzunatea, ot 
^l^Dnthiii Qud parameier 
in chB spoloeit De- 
af., 1994a). 
ion by cofloputizig tbft 
wont, which is histo- 
taak. Observe that most 
Most of fiiese splits aie 
wPuit» and state CMiUcr 



and I 



observed i 



TABLE Y Anftlyisis Qf meteed wiads id tbe AlmaiK mmc mah^ajy. 



natjtfflfarinn 







5fi« 


obbtcvuijons 






MoqnayJUttic 


iot E,it ds 






dttfi^ State 






fon. Dir. poar 






S^sax. me 




Plurals and 




U% 








HctOiOpboaM 




4« 




nad Rhode 






te, bone. Qtt 


4% 



and Oorio, I993to). One can also analyze ibe meigf^ 
phenoiiKiiDn, yielding a similar distnbctum witb axx averB^e 
of 1.15 lext-woitls per acoasiic duster. Tabl^ V provides 
an^ysis of these merged wpids for ihc Almanac systenL Ob- 
serve fbat loorc than half are due to inisspeliiji^ ami abbjc- 
viaiianR In ihc text— entJiB in name only, not in feet. Tht 
fint TOW in T^blt V is partiaDy a^l actiisu:i of "read spccdi,'* 
where text surfi tts Wand WCTwufo were proa^oDced identi- 
cally. T^ie VI pnivicSBs, for the Pepomnent Store lagjc a 
listiiig of various acoustic c^ciat&is whiah contain more ttra 
one tc^t-wonL Observe that sddid of ihfise meises are se»* 
manticaUy sigmBcaat for the tsuk <e.g^ brcad/mO while oth- 
ers axe not (e,gn ru»ii/hi2its). 

Joint acoustooteemantic distortions 

These apUt/merge phencmeaia are qiuJte sensitive to 
threaboId-paramBter vahjes in the adapdv« dustcring algt>- 

1ABL£VI. SoimeifanvlMOfmc^ vudsbtfacdeponnra 
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bed 


bit 


btue 


ticcad 




to 


head 


5« 




red 




does 


Gnd 


tree 


ItSTS 


Ma 


hOXB 




fine 


ttnee 


tfaa'i 


pillow 
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FXX loiat acousiic;.f£miRiu: cQxtriKiODD io dspatim »t stow mafaixlar>L 



I distiiK icns 



rimm. as discD^sctl in (Gomi et al, \y9A^. 
nnicb iinproved feanire analysis and distonic ) 
ftCCntttic sod ttxt lexicons should become 
(modulo boinaa]rms and bonKipbaaes), For 
ever, we observe that our peroeptual metju: 
acquired nait That is, there are sound 
perceived (or not) depending on where an 
raised [Kiitai 1993). 

Thi^ modvates us to mviesdgate bow ic 
coustifi informatiM Io improve the acqtdsxdon 
tion of Spoken wonb. Although a topic lor 
we mate some prelimiAaiy ohsfrvatioiis in. 
partuniJar^ one could inve&tigaCB dnstemg 
acquire spolfico wDids in the composiie a 
Space. FoiT each pair of wxi-vocabiUgTy words 
ment Store Ufik» wb compute thcv ac&mssic 
imnimuin ncaiest-nej^bbor distance becwe4) 
templates in their respective chutod. Then, 
WDfd-pflfr. wft ais^ cosspote a Aymmfitrized 
tiOD ixsiog me melbixls of Sec. X 



A seacur pJai of disse two distortioTis is showa io Fig* I3> 
along with prtjections onto the tadlvidual axai ;. The acoustic 
distorttoD distribution is sssantially gaussiaQ, as discussed 
fuidier in Goriti « al. (iWl The semantic di stortion distri- 
bution is exponentiai, and is nearty linear if sxamiDCd on a 
log scale. 

In Che experiments of Oorin etaL (1994a) 
aad Gorin (1993b), a purely acousiifr critaior 
adaptation and acquisition of spolcen words, 
to a vertical decision boundary in die nvo-dir^ftosional 
tofiram of Fig. 13- How mJghT One exploit 
supervision via Chis joint distribution to 
lion procws? Wc conjecturti that a Ulted decision 
might «^aTat£i senmmjcoUy disdnct >¥Oids 



* unpiD^ie 



J, Acoust Soc. Am., VbL ff7. No. 6. Jui>e 199 5 



Ffesiimably, with 
measures, dte 
[pearly identical 
bunmiu, how- 
ia a panially 
which are 
individoal was 



exploit nona- 
and adapta- 
^mie research. 

duracticn. In 
methods Which 
acoustic/seniautic 
in the Depart- 
^istomon Wa a 
the spectral 
for each such 
semantic diator- 



(16) 



and MiDcr 
was used for 
xrrcspomding 
his- 

itemaatic-levc) 
the acquisi- 
hoondaiy 
which are acous- 



hcaUy similar (cf. Ihble VD. While an intngnix^ possitvlisy, 
this remaiiM a subject for ftituze research. 

VIL QUAMMATICAL rNFERENC^ 

Id the previous section, we described some imniiions 
and exp^uoental re^^uhs involvinf automated asquisitUiQ nf 
Spoken wotds. llifi ne^t tevel up in die lingaistiic hiemcky 
(Levinson. 1985) is gKunmar. tiadidonaUy viewed as cooi- 
[■nsins the rules which constraia how wofds are put (needier 
iDtD scnicnces CWinognut 19^3). Grammtfical stmctcie aifl> 
modulates the semantic associadoas of a wocd, acjgusdiis ^ 
meaning depending on where it appears m a sottc&eo. For 
eacample, within the Department Store CaU Rouiet. the net- 
wodc associations of the word chhlr dioold be quite ^fififenut 
m J warn to buy q tfuur versus in the ficntcnce / /teed seme 
glu4 to fix my chair. In the experiments described thus far, 
dte netwoik assodatioiis of a word cr phrase ba^ve been 
context-independent. Hence, a natural next step is to extend 
these mediods to encompass cantext-dependem associatkRks 
of a woid when it oppcais in a paxtiailar pam^-speodL An 
intimately iHlatfid question is how such parts-ef-speccfa 
mi^t be automatically acquired. In ttdB section, we leooum 
90itte intuitions and preliminaTy eixperimenta m these diiec- 
tions. 

Tbm. has been much debate over the yean over bow 
much of lin^stic structure is innate versus aeqmred (Chom- 
slcy, 1965). In particular, gmKuoatical infmncp ftmn 
samples of the language has received much attention.^ How- 
evei, this is a ittucb Tnore difficult problem than people actu< 
ally 8Qlve» wht> acquire langua^ not only by listening to k 
but by using ic dniinfi Che comse of in[£99cdn£ with dsetr 
eavtrombenL Hiia contrast motivates us to iwestjgaie hmv 
to exploit snch ejaria-tirtsuistic informatioii in automated lan- 
guage aotifuisitioat govemed by the machine's d^iie co un- 
derstand and nwpnfid appropriately to iU input. 

We are modvated by human langtt^ acqoistcion* in 
which the early characteri^don of a p&it-of-^peccib is 
seroantic/aensary* Hence grounded in onr physcal 
eovifoninenL^ For example, the clcmentary-schooJ definiiion 
of a noun is a "person, place, or thing'* aiid a verb as a wwd 
wbich comMait uctitm. One can similarly provide seiramicA 
sensory definirians of a4jectivefi. adverfae, ate. While a sub- 
ject of debate In llngulBrics» these defiruticuii underiie crur 
intuition of how to Qxpkiit semantic/sensory assodaiiuDS to 
bootstrap grammatical izrferenea. In particiilar, wt. propose to 
define a part-of-sptech, for some device, as a set of worvb 
which are stmngly associated to some dimensioD of the d&- 
vicB 1^0 periphery. In formal language theory (Aho and 
UUman, 1972)^ one dexu:)tes the vocabulary words as the ter- 
minai symbols, while parts-of-speech and phrases are de- 
scribed via nonftnmnah, Ports-of-spcech are a pardcnlar 
type of noncerminal which describes a word^lass, some^ 
draes denoted prttermlr\al (Percira, 19^). 

Our nrtost elementary systems are based on a sioglerlayer 
odwoTlc where the input periphery involves only wcrds and 
the output periphery is. a set of meamngful machine actions. 
In Sec. V. we quantified the twth>i of salience, then used it to 
rank-ordefr the vocabulary for several U£ks in TWiles a HI* 
and IV, By selecting a threshold, one can induce a salient 

ANan 6orfft: Autofnateo languaigs aoquteltton 3463 



PACE 33/38 * RCVD AT 7/7)2005 2:17:58 PM [Eastern Daylight Time] * 8VR:USPTO-EFXRF*1/5 * DNI8: 8720306 * 0810:1-410-510-1433 * DURATION (mm-ss):31-30 



To: USPTO. Page 34 of 38 

09/23/2003 11:22 



2005-07-07 18:18:49 (GMT) 



1-410-510-1433 From: Thomas M. Isaacson 
NO. 943 CIS 



t wmds. 



pretora^nal for a task which we „ 
AAJicDt wordfi exemplais of S 
tenainal to oiodulate semantic 
the ssaodaSans cf Donsslieut ^ 
i(u»S\c) the coAiExt-depemlem 
wond V pnxluced by pretermiA^ 
Uect pirtcnBinal, which j& in this 
of 5. We can modnlato the semam [c 
by defUiing 



^olB ^, That is, ihe fai^y 
One can exploit this pie- 
a^sociuioiu via fiuiipces^ing 
In particqlar, d&dote by 
assDciodons to actioQ c of a 
Denote hf S the noasa- 
case jufit the coQairicmcm 

associations of a ward V 

07) 
(18) 



wbfire /(u.c) is tlie contexi-indep mdeat mutuiil infoniiaiiDa 
described ia Sec. HL Sa&k^ showed (Sankar er oLt 1993) 
that ttus anzple modulation leads xx> axa^ improved under- 
sxaadws iw in the Operator Sen ices tasfc. Ftothcnnoce. S 
can define die gnbvocabnlazy 
for ihe last 

In mm complex devices, one 



nriiLaUiD lbi& maimer. For example . in MUler Almanac data 
'Btiieval system, ^e can induce xne praenntoat which is 
salient for statfe-selectirei, another vhich u &aliem: for 



ftthibuto-aelcction (cf, l^bJe HI). 

system, n« can induce salism pret&oinais for the color and 
skape sensory priojitives. In Henii* extension of the Blocks 
*<W!irld, there are several sensory ai d semantic pEimicive lay- 
6rs» leading to a cocrespondiog DiuUber of pfetteminaU- Re- 
cently, Masukaia and Nalcag^wa ;i994) have irpofted qu 
gramnutjcal inference exploiting s M«ch and visual mpwt in 
a aiocka World- Thci^ are many intncsdng issties wMch 
arise in grammatical tnfesenoe for such devjc&s. In this paper, 
honi^BYcr, we report only on some initial expcnmeDts far ex- 
ploiting octra-linf^ktic associations iar grammatical infer- 
ence in dxc Cam RouEing tasKs. 



a wordspolting iroDt-^nd 
can induce several preter- 



In Sankafs Blocks Worid 



rtype 



represeni 



A. TO^rams of sallenr WDrds 

Trigrams aie an elememary 
the allowable (or prabafate) adjacent 
1990). There are «eJI-lcnowa 
dency of even /i-grams on tenniba] 
natural lanE:oagc (Cbomaky. 19<?5). 
tngroms on nontemifwis, however 
viding a much more poweriii] 
man, 19721. 

We hn&fly explore this Intoitkn 
gr^m of salient words for the Opmtor 
1993). Figure 14 shows the left and 
hams, which is strongly asaodated 
number biUing. The • node indii 
other word. Observe thai iu>me is , 
with probability 0.94, and followed - 
prooability aa3. This low branchinj; 
for the other Calient ivoids. Such I 
context can be exploited In speech 
performance of a ^rordspotting alga 
exampie, aiflioufih tt is difficult to 
h<nR^ in fluent speech, tC i» much 
"my Horns phoru'* (Hanclc. 1994). 



of grammar, describing 
symbol pairs (Jelinek, 
against Oie suffi- 
&ynibals to describe 
Utilizing rules Involving 
is another matter, pio- 
tation (Aho and Ull- 



uith 1 

: indicaes 



3457 JL ACDUSL Soc. Vol. 07, Wo. 



by examinios the tri- 
Services task (Leev 
I ight coniext of the word 
the ftuccion of third- 
a wildcard* i.e. any 
P^eceded by the word my 
by the word phone with 
^alctor is alao observed 
I^Shiy constnined local 
reco^tion X£% inqprove 
algajrithin (Rose, 19930- For 
Spot the manosyllable 
to spot the [riira$c 



emer i 



. Jurts 1995 




FIG. 14. Ittcd. cDntez( of ibe salieu vwd t^itmF 

Figure 15 shows the comsui of the salicdl word AT&T, 
which is slightly more varied. In Aiis example* the context 
can modoJate the meaning of the wojd. RecaU thai we have 
defired the meuking of a word, for some device, to be its 
Detworfc aisociations to die device peripheiy. In this panieu- 
lar davtce, noBomng reduces to the vector of wsodflliotts be- 
tween Che woniiiAraae and caU-acdona. For example, when 
AT^r appears in the contexi " my aTAT cttni " it 
should be associated with the caJJ-actmq of credit-caid bill- 
ing. Altematively. when it appears in the cdozcxi " kjsv^ 
ATAT Umg ... it should be associated with accessing 
AT&T as a Jong-distance carriet It iftinaiM a mbject for 
future leseafch to experimentally evaluate the Utifity of tbtse 
acquired semaDlic fragments. 



B. A finite state grammar wKh salient states 

In natuia) langoage, patu-of-speech ase chara/:tm2ed 
both semantically and via withia-Jangwige tisane patterns 
CMaraisnSi 1982). This nwtivates us to explore evoKdog our 
purely semantic chaiacierizatioo of DOtKemunals m a suLBar 
maniKx. In partienlar. we repoil an an earfy cxpennxent 
which spUts aixJ rccaiimates saliuic noaiannmals based oa 
wiann^language pattaros. 




ilO. 13- Local «»itaxtof AT&T 
AJien Gortrt AutnmaCBd tanguao^ aequisMM 3457 
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TABLB VH. Salieitt wnria for the dcpiinmem UOro «y«cnu 



Vterd 



VfiJHT 
DRSSS 

SOCKS 

5ILX 

HAMMER 

IW 

WRENCHES 
POLISH 

ROOM 

WEAR 

BLACK 

FIX 

WOODEN 



TWe vn show» 3omt d£ tfx most sftlicrjt 
Depanmcm Smie ttsk. Wfe cpUt 
»g jud 3U^ le&t seniesices« inducing ft 
lience tfareshaldiag An cHa tniinii)^ d&ta. 
nutbod of Ennr CoiTBctiiig Crtumruuicnl 
(rricto «id Vuto, 1992) (Ruioi and Vldai. 
finite state ^ammar ^th Ifi salieat states 
Rruaed version of which k shown in Fi^ 1^, 
on wi(lii».Unggap» context, (be angle saljcm 
and refined iiiu; 18 twpunue siaces. Five of 
in tne pA^ncd grammar ot Fi^. 16, denoted 
aate is indicKcd by Oie^ symbol the swj 
The graph b cfiiecr^ wit YnntmoQc from 

One tnoihod of cxploibng this gnunnia 
mulubiit; mvtwoikassociatlCDS depending oa 
iS pined into a &aliexit state or not, foUowlns 
and OSX Tb ev&luaie die fea<ijbilfty nf rhjs 
TO ^vadiuki* v'btOwi d)« Eranunnr doo3 ft 
^icnccxt^g^ig than iQcrdy salfence 
serve Thai, in tbe Department Store task, mc^ 



\ ptCkLLiusni 
ye 



ihreih&ldm^ 




J. /^causi. SocL Am- Vol ST, No. a. Xme 



1.30 
1^6 
12\ 
1.13 
1.13 
1.12 
1.IQ 
JJ18 
IJQS 
IU>t 
1.02 

i.ai 
a* 

0.81 

uao 
an 



wunb fur Uic 

al i via sa- 
\^ Vidal*ft 
(ECGI) 
to induces 

mxt S is ^Ut 
states appe&r 
Thestare- 
fsmtK hy *(a 
to right. 
wuuUI be to 
j^hdhcr a WQid 
fcxnxialas ^1?!) 

a first fitep i$ 
better jqb of 

We *b- 

(64^> of flic 



nnglston words are sutjectively ^'ent, sum psrtkipams is 
the expeiiincnt were cootiDuaHy inventing new Items to 
"purchBfiB." Tb define a salient premnuRaJ. The safience- 
th twhoM wos diU3 sdjisied co include ttttsc winds of &e- 
4U£ncy QIC The enof of that ndve salience-tag^r on 

new wcid5 Is oien 36%. 

By way ofcampanwa, the 305 lesi senteooes weic ana- 
lyzed via &rt Rrmrcoirecting paiser (Ptkto and V Mai 1992) 
ond the complete induced gramoiAi. TLb wunls to Ae test 
sentences wm tagged as salient or not, d^ezxfing on the 
state to whirh tbev were usignecl. In pamcolar, wa foam ftn 
the dondoaDl salient state, u> which 81 qcw words weie as- 
ugned. A sulqective evalnAti6n showed tbsi only 5 are non- 
ialisnc, that Ifte ttitn-^rs^ of the salieare-msger far the 
dominant state is enly 6%^, a sv^Mitdnc^on as c oi fipar g d to 

thd Aaiva saliencft- tagger rlwiniVc^ aboVc. While quitB pm- 
litmnuyi this expeticoant indicates the udlity of tbe induusd 
gj ^iinuu Lu iinpiDve salienee-ugging over comcja- 
iiiddp^Tidcn! salieince' thresholding. 

This automated salience tagging can alui het ufteM «bni 
ftf^irin^ nriw iwnr^, for ft'hich tlw ecdmadon of 133 anal 
informadtma is quite noisy (cf.* $ec. J). Oiie wuidd like to 
focus the teaming aleorilhro based on syntactic statt. For 
exaaa^, one of the te« sentcDwa U £>o you sell (?) 

a naw mud (indkoatcd by 
{?)) wMcb aliened by the par&ci to a Mlieac ifcaie (indl- 
cflied by (S)), Aftdthcr exan^lft iS Do yon have (?) hemhag 
{^cham?, where Atfiwtaje j3 a new word which is nor 
signed tn n falip-Tir crat*. A topi<? fpf fytUre reseoroh ia to 
provide improvod estimator] for low bequcncy wmihf by 
ex[jlokiiig fcujcb Kaiimice-taggbig fteederating the Jcam- 
tng mte icr fi^hiighis and dowijij^ it down for beanbas)^ 

C* An QirOiK; informotlon »y«twn 

rrtTKiriRT a-flervice whose qctionfi involve pairs Of places 
or thingaj The priociplo of dciiclopiiiciiul teaming lEflfls us TO 
iMDsniia: a device \irtik:h acqiijits die lai^^ 
in stoges, fiisi Ifcamiafi the laniruase refennig to indivkluAl 

placiiA, only thftii Rcquiriqg |h? |ftng\|^Q inVftMng pwr3, 

Gertnw gonatrucecd a hicrarcliii;jd mwurlc for such lafls, 
aiialoguus to P]g. 2. except that each hbAn i$ replied by ^ 
sobnctworic ccmwpondmg to a lingidsdc nantftrmiimi (G«t- 
riRT GofiD. 1€IP3) Th» ttetwprk is hicroibbioJly sGlf- 
similar, in that coch subnaworlc i& Lixelf a cu|iy of Fig, 2, 
widi nodes corresponding to vocdbnJary wcmls (tcrndnals) in 
the language. TTie subnetwoikfi conespoad to nonterminA]? 
for placetobject nftmes {cf. nouw) and modifier phrtsea. The 
<:«i?trwtioa of assooLjitioua between phnksc$ and ^liuns is 
factoid duuugh ihe^tt inceitnediace nontermin^^, providing 
T^id learning and geoenlization. 

THs network was evaluated in an AlrUnt Tnfnrmi^ion 
T^Sk, & CtmtShnnnrAH domam fijr ijiV»£tigadon of QOCUml tor- 
guage undentanding systems (Lcvinsoa and SliijilBy^ I9g0) 
(Waid. iWl) (WiiB, 19«) (W6tidA, J9(«>. in particular, the 
task involved retrieval of a fti;?bt table between pairs of cit- 
ies. Wc dfiKcrihe the Rystwri via scverai illuAtotovft dialogs ns 
follows (Gcmier and Ckirin, 1^3). Ihc flighi tables are not 
acttiAUy retrieved, Ijeing illusjirauid by a fid^denca of asiBraks ■ 

Alieft Corin: Auicmat^ language acquistUon 3453 
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Scs^n On& iUiistratei the du]og 
te$olves an azDbig;uiiy in ttie iiticic I 



Session Qie 



User: 
Machine: 

User 
Machine: 

User: 



HELLO. 
I CAN SHOW 

BETWEEN ANY 
HOW MAY I 



YCU 



[HELP 



THE FUGHT TA£L£ 
FAlk OJ-' aiTCS, 
YDU7 



DeimiL 



lUKE 



/'if Hit! iU to 

WOULD YOU 
TABLE PROM 

WOULD YOU T-rejp 
TABLE FROM 

HERE IS THE 
FI^OM NYC TO 



BOSTON 



NfC 



LIST 



♦4^ 



OF TOE FUGHTS 
IpKrROIT. 



Sessions Tivu und Three 
-^cr phrases ttiat deteimine which 
tvhicb i£ ttie on^ of the flighh 
ernirif^f!, ^r've cbosen thft inpnt 
poli hmguasCi whicb usc& pi«tpij_, 
Kzonal moAfien (Yww£jcv> 1990} 
fea&onahle oerwczk wouia bdve 
bfrt^i'eeo that particular fi^nffucr 
A piquet network would ovta be 
phnsc €j%k.a^v UaUA h aiujociaied 
<tep4rtu|B city. One would deiirt, _. 
WMLld fiir(bsnnaie team rhnr 

mocJify the ssmantic a.'wavHnti 
jDst the ones wiih whj(;ti diey ^vere 
JDUsttBtes Ihjfi generalization, ilcrac-- 
factoring; the woncl/acdoD as^qcciatio^s 
Aire bcnteT^TfuuOfi. 



VoT tht. 
► postpusliiouiU 



as li 
liblc 1 



Session Xii?o 



Machine: ARK YOtT 

WOULD YOU LDCl 
TABLE FROM DEjrJlOrr 
W6, /iTitf off lew way a T>tind. 
T WILL SHOW YOW 
FROMCHICAOO 
Yes. 



MadiiiM?: 

USCTT 

Machine: 



User 



Machine. 
Us^r 



HERE IS THE 
FROM 



CHTCAGOTO 



WW 1)14 «««« Mtt A « 



t Thnee 



5459 ^LAiSflusiSocAm.. Vol. g7.Na.e. June 1995 



comrdl system, which 
input srnitencB. 



TO 5EE THB FLIGHT 
TO DFIROrr? 



TQ 5££ THE FUOHT 
TO DETROIT? 



d6(|ui5itioD Qt modi- 
is the desdnatioa and 
«^ of ^n intemdug 
to involve the Ne- 

TBibsTrtmpTtposi- 
After Session Two, any 
m^orized the asruyH^Hnn 
4)c appropriate ao&aa. 
to goimndi/B ibat die 
With Qiicago bemg the 
however, that cbe system 

OflV WOI*, Mid 

)tu of any dcy mune, ikm. 
ob^esL Season Tluee 

the efficacy of 
fhfoogh tha ifttenjtodi- 



IwrrysSTED IN OTHER 



10«KH THE FLIGHT 
TO CHICAriO 



THB FUGHT TAJBLC 
TODETROrr 



LI3T OF THE FLIGHTS 

DRTROIT 



^^^^^^^^^ 



Session ' 

ARE YOU irO^FiE&ltiU IN OFHER 



Machine; I WILL SHOW YOU TOE FLIGHT TABLE 
FROM BOSTON TO WASHINCTON. 



User: 



Machine: HERE IS THE UST OF THE RJGHT^ 
T&OM BOSTON TO WASHINGTON. 

* »»»*»^ ^ 

VIII. CONCLUSIOII8 

We have deficnfaed the principles gjod mechanismA nn- 
dtriyiiig our xe&eaich b auDomated lacgoa^ acquishkoo. ex« 
ploriqg their ulility in sevonl experifncnlal sy^ccms iavolv- 
ing boOi tMl ajul s^ken Inpnt TUe bwc mechanism b 
these expemjms has beeii an /i?ffljni<2n*oji,i^^ 

network embeddad in o/^fWJwJt cmtp/ j>Bftm. Tl,i5 
haaic network was flvaluotcd ii) the amtext of au AutunBiESd 
CaJl Rouiin^ mk. For language acqnisitmn in more compkx 
dfiVidis, wa firel iniroduoed idea of nsxwaik archiiectiitts 
which leflect tb& structure of the device** inpoi^onipct pe- 
nphwy and enviramiiflnt. We ihcn nuroduoed the iifea uf 
dcvclofmicntaj learjiin^ wbtiro a device proceeds trom the 
simple CO the complex, from the coocrete to the absiract. 
These ideas were cxporimentaHy evalnaied in ih^ «uieffM of 
an AlTTianftT p^tn JUtriavAl tti^ q nmldacnsoiy BhicU 
World) and on Aidinc IniionnatLLH] msk. 

We have propcsicd a i^li^nce Iheoty wMch qiumiifiBS the 
inforttiatiOD conteor of spokjun lanfm^ for a particular rir- 
vicc defining meanih^ in wmis off i^atwwfc BS&ociatiODa to 
tb** device penphftiy^ PralimioQiy cJtpcrimenu wac pie* 
scnicd dcmoii&lraliufi ibe feaahiUty «f s^pkitiag such raean- 
ing to iraprovt spoben word acquisidoD and granmaDCfll 
inference. 

ThePB ^ eov^ra) Ittflin directions for ftmuc icses^uclL 
Thc.firsi is to demuaaraEe scalaftU:^ of thdSA methods to 
largw snch as teleoooiercixce fiacUity comroK darabase 
letrieva] and robotic ccntroL Tin* verand w ioiprOYD pw- 
fnnnadce on "'simplBT" cofiJcs anch OA ftutamnicd call loiiiiug 
and daio ictricval, mfvanciug uur luuIcrMaudtog of how to 
Inie^raiB these methoas swiih w«rdfpoQij)R and krce- 
vocabtilaiy speech recofiniuon. The third dirrxticD is t» de». 
onswatE liow meting nan t» ntpl^iti^tf ^ imptttv^ the acqiu- 
(iUm of rohiwi modda of spoken lai^uagtj, in panfcalar ac 
d-ie ItPTcIsi uf subwoni uniB. words and granmiar 

llliS restanch fbmw die basis of a theory of syntax and 
scmaodcs, where conveying meaning is primary and linguia- 
tic etnwtu™ mfv« co makCi iMCh tomiinutiauiOT roDUSL AJ- 
though our cxpci iu jcniiil davloes are thus far rtiduncwtary, \v« 
cowidtr thcro to be the early stages of a loo^-tenn mveRti- 
sation into cnachines whirb fiiuoniadc^ty a«piire tWEUafic 
thro»)gb mfcemccioa with a eomptax cnwroiineiit 
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